Objectives:

Brain aneurysms managed conservatively with follow-up carry a risk of aneurym rupture.

The objectives of this systematic review were to:

  1. Estimate the overall rupture risk of a SIUA managed with surveillance
  2. Examine whether aneurysm size or exposure to prior subarachnoid haemorrhage were associated with rupture risk
  3. Examine other potential sources of heterogeneity in the overall rupture risk estimate.

This will help clinicians more accurately convey the uncertainty in the rupture risk estimate utilising the current medical knowledge to patients.

After performing my systematic review, I have three additional aims:

  1. If appropriate, to quantitatively synthesise the data in order to calculate a summary pooled risk estimate.
  2. To identify and measure heterogeniety across the studies to help consider whether data synthesis is appropriate.
  3. If data synthesis is appropriate, then to explore the reasons for heterogeneity by considering outliers and influential studies, and both pre-specified and post hoc sub-group analysis and meteregression.

The statistical approaches will be tailored to the source dataset and structure of the data which is characterised by the following:

  1. Rupture of an intracranial aneurysm in a single patient is a rare event, usually estimated at 1% per year or less.
  2. There are a small number of rupture outcome events in the sample populations, with the rupture proportion very close to zero.
  3. While the outcome of rupture is binomial, in a sample population, the distribution of rupture outcomes over time is is better described by the non-central hypergeometric distribution, since some aneurysms are considered at higher risk of rupture.
  4. There are a wide range of populations sample sizes in the literature, with both very small samples and very large samples.
  5. The rate of rupture over time does not appear to be constant when biological factors and current published data are considered, thus the risk of rupture will be considered.

To achieve these aims, the various packages and source dataset in R are loaded, and each part carried out with an explanation of the rationale for the data analysis. Version control is carred out with GitHub.

Load required packages

The following R packages were loaded: tidyverse, meta, metafor, BiasedUrn and dmetar.

Load required data

A single excel data file is loaded to carry out all data analysis, with all steps documented below to ensure reproducibility of research.

Correct vector types

Vector types are corrected to integers, factors, doubles as appropriate for analysis in R, and stored as a new tibble.

Calculate individual study proportions and use meta-analytical methods to create a pooled summary proportion

Individual rupture risk at study entry can be calculated by calculating the proportion of patients who ruptured ie pi = xi / ni.

xi = cases ie number of aneurysm ruptures during follow up ni = total ie number of aneurysms at study entry pi = raw proportions ie xi / ni

These can then be combined across studies to consider a meta-analyis of proportions. When considering any meta-analsis, the basic steps are

For every dataset, a suitable effect measure must be chosen, and a choice should be made regarding the meta-analytical methods. Most meta-analytical methods weight the individual effect sizes from each study to create a pooled effect size. In this study, we will consider individual study proportions, and create a pooled summary proportion for rupture risk.

Given unique characteristics of the data that we are synthesising, appropriate statistical methods for meta-analysis and calculation of the confidence intervals must be considered. This is important, because different statistical approaches to synthesising the data can produce different results leading to different conclusions. Cornell et al AIM 2014

Choice of meta-analytical method

The dataset is sparse, and the overall distribution of rupture events across all studies is highly skewed towards zero.

The first step is to improve the statistical properties of a skewed dataset in terms of the data distribution and variance prior to synthesis. Variance is the squared difference from the mean. To achieve this, the data is transformed in order to approximate a normal distribution to enhance the validity of the statistical procedures. The most common transformation methods are the logit and the double arcine transformation.

In the logit transformation, proportions are converted to the natural logarithm of the proportions (i.e., the logit), and assumed to follow the normal distribution. All statistical procedures are performed on the logit proportion, and then the logits converted back to raw proportions for reporting. However, if there are no ruptures and pi = 0, then the logit and variance are undefined. Furthermore, if there is high between-study variablity or small study sample sizes, the logit transformed proportion is underestimated. Hamza 2008 To overcome this statistical limitation, typically a continuity correction of 0.5 is applied. In our analysis, this creates risk of introducing additional sparse data bias and reducing the validity of the result, especially given that pi is close to 0.

The alternate method of transformation to consider which can better normalises the distribution and variance especially for small sample sizes or rare event rates, is the double arcine transformation of Freeman-Tukey Freeman and Tukey 1950. After statistical procedures, the result can be later be backtransformed using the equation derived by Miller Miller 1978. However, the Miller back transformation utilises the harmonic mean of the sample sizes. This affects the backtransformed proportion as described by Schwarzer 2019, and leads to misleading results. This issue is particularly evident when there is a large range of sample sizes. This issue is of particular concern and can lead to misleading results given that the sample sizes on our dataset vary from 22 to 3323.

In the classical meta-analytical methods, after transformation and statistical procedures, the results are backtransformed to study level results, and synthesised.

Data synthesis can be carried out using the generic inverse variance method, which calculates the individual study effect size, and creates a pooled estimate after weighing each study by the inverse of the variance of the effect size estimate. This means that larger studies with smaller standard errors are given more weight than smaller studies with larger standard errors. This minimises the imprecision of the pooled effect size resut.

A variation on the generic inverse variance method incorporates an assumption that there is some variation between studies, ie heterogeneity, while measuring the same effect. This produces a random effects meta-analysis, the most common of which is the the DerSimonian and Laird method DerSimonian 1986. This is the most commonly utilised method in medical meta-analysis, and the default method in many statistical packages. However the DL method is known to produces confidence bounds that are too narrow and p values that are too small when the number of studies is small or in the setting of high heterogeneity. In addition, when outcome events are rare, such as in our dataset, these classical meta-analytical methods for data-synthesis also have the potential to contribute to additional sparse data bias and give misleading results. Bradburn 2007 from Cochrane

In summary, there are significant statistical limitations of the classical DL meta-analytical methods of transformation and synthesis with our specific dataset and research question.

To overcome these statistical limitations, we can instead utilise the random intercept logistic regression model for statistical procedures and data synthesis, a type of generalised linear mixed model, as recommended by Stinjen 2010 and Schwarzer 2019.

Our rationale for choice of a GLMM for statistical procedures and data synthesis is based on the following:

  • The GLMM is an exact likelihood model based on the the binomial distribution, and thus accounts for the binomial outcome of aneurysm rupture.Hamza 2008 For large samples with a common outcome, the binomial distribution is a reasonable estimate.
  • The GLMM has greater accuracy of the logit proportion and coverage percentages of the corresponding confidence intervals compared to standard normal approximation logit transformation, especially when there are small sample sizes and greater between-study heterogneity. Hamza 2008
  • The sample populations range from 22 to 3323, which creates misleading results when utilising the alternate Freeman-Tukey methods of transformation and backtransformation of Miller. Schwarzer 2019
  • The outcome is very rare, with proportions less than 0.05 is almost all studies, and thus generic inverse variance and DL methods for data synthesis can lead to misleadong results Stinjen 2010
  • GLMMs can be fitted using the logit transformation, and results backtransformed to the original scale, in our case a proportion.
  • The GLMM does not require a continuity correction even if pi = 0. Hamza 2008
  • Additional calculations to consider covariates can be performed using noncentral hypergeometric models.

The limitation of this statistical approach is that individual study weights utilised to pool the individual studies to create the pooled proportion will not be available.

In the past, utilisation of a GLMM for meta-analysis was not practical due to its computationally intensive nature, and lack of availabilty in standard statistical software packages. However these limitations are now overcome, with statistical modules for GLMMs now available in both SAS and R stastical packages.

Choice of confidence interval

Use of the CI is important, since CIs convey information about magnitude and precision of individual study effect size and the pooled meta-analytical effect size. The choice of the CI should be tailed to the dataset that is present. Options include:

  • Wald method or Normal Approximation
    • this produces CIs that are often centred on the point estimate
    • thus they are not often suitable with proportions close to the boundaries of 0 and 1, since this method can create CIs that are below 0 and above 1.
    • for proportions that are often 0 or close to zero, a continuity correction may be applied, but this can lead to additional overshoot
  • Clopper-Pearson or Exact method
    • Most commonly used, and recommended to avoid approximation by most statistical textbooks.
    • Called exact because it is based directly on the binomial distribution and not an approximation.
    • Output is usually conservative, and the true coverage is almost always larger than the derived coverage ie closer to a 99% CI than a 95% CI.
    • The derived 95% CI does not accurately reflect the true 95% CI unless n is quite large Agresti 2008.
    • When n is small eg 10, there is severe overcoverage (closer to 99% coverage) with the true CI much larger than the derived 95% CI.
    • Even when n is large, the derived 95% CI does not accurately reflect the true 95% CI when p is near 0 because the binomial distribution is highly skewed. Agresti 1998
    • Needs a very large number of n to be accurate Brown 2001
  • Wilson method
    • Is suggested for small n ie 40 or less and/or extreme probabilities Brown 2001
    • The derived CI more accurately reflects the true CI with less variability compared to CP method Agresti 1998
    • For small n, the Wilson CI’s are shorter and a more accurate derivation than the CP CI’s Vollset 1993
    • Can be used and is recommended for all sample sizes Newcombe 1998

We will choose the Wilson method for CI for the following reasons:

  • Rupture of an aneurysm is a rare event with the proportion p very close to zero
  • The distribution of the outcomes is highly skewed towards zero, and thus Wald confidence intervals cross zero.
  • There is a wide range of sample sizes, with very small studies and very large studies included
  • More accurate CIs are likely to be generated using the Wilson method given the highly skewed and sparse dataset.

This is aligned with the recommendations of Vollset 1993, Agresti 1998, Newcombe 1998 and Brown 2001.

Analyse total cohort of all aneurysms 10 mm and less

Assumptions:

Consider number of patients vs number of aneurysms for ≤ 10mm aneurysms

Some studies report the number of patients for ≤ 10mm aneurysms, while others report the number of aneurysms, and others report both the number of patients and aneurysms at study entry. The proportion of patients with multiple aneurysms is also generally reported.

Given that only 1 aneurysm in 1 patient ruptures, which is the outcome, and that the number of aneurysms ≤10mm aneurysms has complete data capture, the analysis is performed with the outcome of number of aneurysm ruptures per 100 aneurysms for ≤10 mm aneurysms.

The rationale for this is:

  1. Conveying a rupture risk per aneurysm is easy to understand from a patient perspective when they harbour multiple aneurysms.
  2. Makes biological sense, since the 1 aneurysm that ruptures is considered to be strongly influenced by aneurysm characteristics
  3. One patients may have multiple aneurysms with a large range eg 1 to 8 aneurysms

Aneurysm multiplicity is taken into account by extracting the total number of aneurysms for that size cohort.

The heirachy is the following:

  1. Use the most accurate data ie number of total aneurysms for size cohort.
  2. If above not reported, then apply same proportion of patients with multiple aneurysms to number of patients for that size cohort. Each patient with multiple aneurysms is assumed to harbour 2 aneurysms.
  3. If the multiplicity proportion is not reported, then each person in that size category harbours 1 aneurysm.

For patients with exposure to prior SAH, the number of aneurysms in patients with exposure to prior SAH is unknown ie these patients too may have multiple aneurysms.

We can utilise the same methodology above ie

  1. Apply same proportion of patients with multiple aneurysms to number of patients for that size cohort with exposure to prior SAH. Each patient with prior SAH and multiple aneurysms is assumed to harbour 2 aneurysms.
  2. If the multiplicity proportion is not reported, then each person with prior SAH in that size category harbours 1 unruptured aneurysm.

In addition, if the number of patients with exposure to prior SAH in the specific size category is not reported, then we can assume that the proportion of patients with prior SAH in the total observation cohort is the same proportion as those in that size category.

Thus if we know the total number of aneurysms in that size category, we can calculate the proportion of aneurysms that have had exposure to prior SAH.

Size specific data assumptions.

The primary outcome is for aneurysms measuring ≤10 mm. Thus there is complete data reported. However for sensitivity analysis size strata, not all studies report consistently for each size strata ≤3 mm, ≤5mm ≤7 mm and ≤10 mm. Note that an aneurysm ≤3 mm also satisfies the criteria for ≤5, ≤7, and ≤10, however the most accurate data is extracted for that size threshold, and additional unpublished data was sought for size strata analyses.

This can introduce additional bias, and sensitivity analyses at the various size thresholds will be performed to examine the impact of these decisions.

Data wrangling

The following steps are carried out

  1. Calculate total number of aneurysms with new variable total_size
  2. Calculate total aneurysms with prior SAH exposure psah_size
  3. Categorise age at 65 years
  4. Categorise source population as Japanese or non-Japanese.
  5. Susbtitute median follow-up values for missing mean values as best estimates of sample means.
dat <- sizedata10 %>%
  mutate(prop_multi = multi_tot / num_tot,
         num_multi = prop_multi * num + num,
         num_multi_temp = coalesce(num_anr, num_multi),
         total_size_temp = coalesce(num_anr, num),
         total_size_temp_2 = coalesce(num_multi_temp, total_size_temp),
         total_size = round(total_size_temp_2, 0),
         psah_size_temp = psah * prop_multi + psah,
         prop_psah = psah_tot / num_tot,
         num_anr_psah = prop_psah * total_size,
         size_psah_temp = coalesce(psah_size_temp, num_anr_psah),
         psah_size = round(size_psah_temp, 0),
         ) %>%
  mutate(fu = coalesce(fu_mean_tot,fu_med_tot)) %>% 
  mutate(age = coalesce(age_mean, age_med)) %>%
  mutate(age_cat = case_when (
    age == 65 | age < 65 ~ "≤65-years",
    age > 65 ~ ">65-years"
    )) %>%
  unite(auth_year, c(auth, pub), sep = " ", remove = FALSE) %>%
  mutate(pop = fct_collapse(sizedata10$country, 
                            "Japanese" = "Japan", 
                            "Non-Japanese" = c("International", "United States", "Switzerland", "Australia", "Korea", "Poland", "China", "Germany", "United Kingdom", "Finland")) 
         ) 

Meta-analysis of proportions using GLMM and Wilson CIs with all studies

Now that we have taken into account the number of aneurysms specific to that size category, we can now perform our statistical procedures on that dataset, starting with a meta-analysis of proportions using the GLMM.

Note the utilisation of the

  1. GLMM ie method = GLMM
  2. Logit transformation ie PLOGIT
  3. Method tau - maximum likelihood method for tau2
  4. Wilson confidence interval for individual studies ie CI = WS
  5. Hartung-Knapp adjustment given the small samples included.
  6. Reporting per 100 aneurysms
dat.all <- dat 
  
pes.summary.glmm.all = metaprop(rupt, total_size,
                            data=dat.all,
                            studlab=paste(auth_year),
                            method="GLMM",
                            sm="PLOGIT",
                            method.tau = "ML", 
                            method.ci = "WS",
                            hakn = TRUE,
                            pscale = 100
                            ) 
pes.summary.glmm.all
##                   events             95%-CI
## Bor 2015          0.7444 [ 0.2535;  2.1655]
## Broderick 2009    1.8519 [ 0.5093;  6.5019]
## Burns 2009        0.5780 [ 0.1021;  3.2011]
## Byoun 2016        1.7628 [ 0.9871;  3.1288]
## Choi 2018         0.5780 [ 0.1021;  3.2011]
## Gibbs 2004        0.0000 [ 0.0000; 14.8655]
## Gondar 2016       0.8152 [ 0.2776;  2.3691]
## Guresir 2013      0.7812 [ 0.2660;  2.2715]
## Irazabal 2011     0.0000 [ 0.0000;  7.8652]
## Jeon 2014         0.3521 [ 0.0966;  1.2746]
## Jiang 2013        0.0000 [ 0.0000;  7.1348]
## Juvela 2013      18.6747 [13.4796; 25.2868]
## Loumiotis 2011    0.0000 [ 0.0000;  2.3446]
## Matsubara 2004    0.0000 [ 0.0000;  2.3736]
## Matsumoto 2013    2.6549 [ 0.9069;  7.5160]
## Mizoi 1995        0.0000 [ 0.0000; 15.4639]
## Morita 2012       1.8658 [ 1.4582;  2.3845]
## Murayama 2016     2.2384 [ 1.6660;  3.0014]
## Oh 2013           0.0000 [ 0.0000; 16.8179]
## Serrone 2016      0.5155 [ 0.0911;  2.8615]
## So 2010           1.1450 [ 0.3902;  3.3118]
## Sonobe 2010       1.5625 [ 0.7589;  3.1897]
## Teo 2016          2.3810 [ 0.8130;  6.7666]
## Tsukahara 2005    3.4722 [ 1.4921;  7.8703]
## Tsutsumi 2000     4.3478 [ 1.4896; 12.0212]
## Villablanca 2013  1.5544 [ 0.5300;  4.4697]
## Wiebers-R 1998    1.3503 [ 0.8558;  2.1244]
## Wiebers-P 2003    1.0204 [ 0.6193;  1.6768]
## Wilkinson 2018    0.0000 [ 0.0000; 14.8655]
## Zylkowski 2015    2.7273 [ 0.9318;  7.7131]
## 
## Number of studies combined: k = 30
## 
##                      events           95%-CI
## Fixed effect model   1.7160 [1.5077; 1.9525]
## Random effects model 1.2378 [0.8149; 1.8760]
## 
## Quantifying heterogeneity:
##  tau^2 = 0.7571; tau = 0.8701; I^2 = 83.2%; H = 2.44
## 
## Test of heterogeneity:
##       Q d.f.  p-value             Test
##  179.23   29 < 0.0001        Wald-type
##  152.15   29 < 0.0001 Likelihood-Ratio
## 
## Details on meta-analytical method:
## - Random intercept logistic regression model
## - Maximum-likelihood estimator for tau^2
## - Hartung-Knapp adjustment for random effects model
## - Logit transformation
## - Wilson Score confidence interval for individual studies
## - Continuity correction of 0.5 in studies with zero cell frequencies
##   (only used to calculate individual study results)
## - Events per 100 observations

Calculate mean follow up in years.

dat.all %>%
  drop_na(fu) %>%
  summarise(mean = mean(fu, na.rm = TRUE) / 12)
## # A tibble: 1 x 1
##    mean
##   <dbl>
## 1  4.36

The output from the random effects meta-analysis using the GLMM:

Summary estimate is 1.2378 [0.8149; 1.8760] over study level mean of 4.4 years.

Note that for GLMMs no continuity correction is used. Meta documentation Given our rationale for choosing the GLMM, this should produce the least biased result and reasonable coverage probabilities for the 95% CI, as suggested by Stinjen 2010. Note CIs are using Wilson score method.

Remember that the confidence interval from this random-effects meta-analysis describes uncertainty in the location of average proportion across the individual studies. Thus there is likely to be an even higher degree of uncertainty in the true population rupture risk. Cochrane handbook 10.10.4.2

Display meta-analysis result using a publication quality forest plot

The easiest way to communicate the result of the statistical procedure is via a forest plot.

forest(pes.summary.glmm.all,
       layout = "meta",
       comb.fixed = FALSE,
       comb.random = TRUE,
       print.I2.ci = TRUE,
       colgap = "7mm",
       leftlabs = c("Study", "Ruptures", "Total"),
       rightcols = c("effect", "ci"),
       rightlabs = c("Ruptures per 100", "95% CI"),
       smlab = " ",
       xlim=c(0,10),
       xlab = "Rupture per 100 aneurysms",
       pooled.events = TRUE,
       text.addline1 = "Mean study-level follow up: 4.4 years",
       JAMA.pval = TRUE,
       ) 

Heterogeniety

What is heterogeniety

Heterogeniety refers to all types of variation across studies. Heterogeniety can be considered in terms of

  1. Clinical heterogeneity, which refers to variablity in characteristics of patients, exposures, interventions and outcomes.

  2. Methodological heterogeneity, which refers to variablity in study designs eg randomization, blinding etc.

  3. Statistical heterogeneity, which refers to the situation when variation across studies is greater than variation or random error within a study.

Clinical and methodolocal heterogeniety cannot be calculated. These are both subjectively evaluated by the meta-analyst. If clinical and methodological differences are not significant, then quantative synthesis is considered appropriate.

If there is substantial clinical and/or methdological heterogeneity, a quantitative synthesis should not be performed to create a pooled effect measure. A qualitative or narrative systematic review should be performed. This is because, the observed effects across studies are greater than what is expected due to random chance alone, which limits the generalisability of the result. The analogy is that a systematic review brings together apples and oranges, and that combining these in the setting of high heterogeneity yields a meaningless result.

Since statistical heterogeneity is a consequence of clinical and/or methodological heterogeniety, this always occurs in a meta-analysis, statistical heterogeneity is inevitable Higgins et al 2003

How to identify and measure statistical heterogeniety

Overall, identification of heterogeniety is critical to assist in interpreting the results of the meta-analyis. This gives the reader confidence that the effect demonstrated is generalisable. It is impossible to completely avoid heterogeneity, however clinical and methodological heterogeneity can be minimised by using strict inclusion criteria during systematic review.

A simple way to identify heterogeneity is to review the forest plot, and see if the confidence intervals overlap. If there is poor overlap, there is statistical heterogeneity.

Statitical heterogeneity can also be measured using Cochrane’s Q statistic, tau-squared statistic or the Inconsistency measure I^2 statistic.

  1. Cochrane’s Q. This test performs poorly for small numbers of studies. If the p value for the Q statistic is below 0.05, then this suggests that there is significant between-study heterogeneity.
  2. tau-squared statistic. Uncommonly reported. An estimate of the between-study variance. When tau-squared is zero this is indicative of no heterogeneity.
  3. I^2 Inconsistency statistic. I^2 represents the proportion of total heterogeniety that can be attributed to the actual between-study heterogeniety, and can be classified into the degree of heterogeneity.

While categorisation of I2 is subjective, we will consider heterogeneity as according to the Cochrane Collaboration, with 0–40%: might not be important; 30–60%: may represent moderate heterogeneity; 50–90%: may represent substantial heterogeneity;75–100%: considerable heterogeneity.

We will choose the I^2 statistic for the following reasons:

  1. This measure is not affected by the number of studies included.
  2. The output offers a percentage that is easily interpreted
  3. This is the standard method of reporting for Cochrane reviews.
  4. 95% CIs can also be calculated for the I^2 statistic to demonstrate the uncertainty of the result.
pes.summary.glmm.all
##                   events             95%-CI
## Bor 2015          0.7444 [ 0.2535;  2.1655]
## Broderick 2009    1.8519 [ 0.5093;  6.5019]
## Burns 2009        0.5780 [ 0.1021;  3.2011]
## Byoun 2016        1.7628 [ 0.9871;  3.1288]
## Choi 2018         0.5780 [ 0.1021;  3.2011]
## Gibbs 2004        0.0000 [ 0.0000; 14.8655]
## Gondar 2016       0.8152 [ 0.2776;  2.3691]
## Guresir 2013      0.7812 [ 0.2660;  2.2715]
## Irazabal 2011     0.0000 [ 0.0000;  7.8652]
## Jeon 2014         0.3521 [ 0.0966;  1.2746]
## Jiang 2013        0.0000 [ 0.0000;  7.1348]
## Juvela 2013      18.6747 [13.4796; 25.2868]
## Loumiotis 2011    0.0000 [ 0.0000;  2.3446]
## Matsubara 2004    0.0000 [ 0.0000;  2.3736]
## Matsumoto 2013    2.6549 [ 0.9069;  7.5160]
## Mizoi 1995        0.0000 [ 0.0000; 15.4639]
## Morita 2012       1.8658 [ 1.4582;  2.3845]
## Murayama 2016     2.2384 [ 1.6660;  3.0014]
## Oh 2013           0.0000 [ 0.0000; 16.8179]
## Serrone 2016      0.5155 [ 0.0911;  2.8615]
## So 2010           1.1450 [ 0.3902;  3.3118]
## Sonobe 2010       1.5625 [ 0.7589;  3.1897]
## Teo 2016          2.3810 [ 0.8130;  6.7666]
## Tsukahara 2005    3.4722 [ 1.4921;  7.8703]
## Tsutsumi 2000     4.3478 [ 1.4896; 12.0212]
## Villablanca 2013  1.5544 [ 0.5300;  4.4697]
## Wiebers-R 1998    1.3503 [ 0.8558;  2.1244]
## Wiebers-P 2003    1.0204 [ 0.6193;  1.6768]
## Wilkinson 2018    0.0000 [ 0.0000; 14.8655]
## Zylkowski 2015    2.7273 [ 0.9318;  7.7131]
## 
## Number of studies combined: k = 30
## 
##                      events           95%-CI
## Fixed effect model   1.7160 [1.5077; 1.9525]
## Random effects model 1.2378 [0.8149; 1.8760]
## 
## Quantifying heterogeneity:
##  tau^2 = 0.7571; tau = 0.8701; I^2 = 83.2%; H = 2.44
## 
## Test of heterogeneity:
##       Q d.f.  p-value             Test
##  179.23   29 < 0.0001        Wald-type
##  152.15   29 < 0.0001 Likelihood-Ratio
## 
## Details on meta-analytical method:
## - Random intercept logistic regression model
## - Maximum-likelihood estimator for tau^2
## - Hartung-Knapp adjustment for random effects model
## - Logit transformation
## - Wilson Score confidence interval for individual studies
## - Continuity correction of 0.5 in studies with zero cell frequencies
##   (only used to calculate individual study results)
## - Events per 100 observations

I^2 = 83.2%, which is high, which means that the between study variation due to clinical and methodological variation between studies is high, and greater than that expected by random chance variation within studies. This may be explained by co-variates and thus consideration should be made to avoid data-sythesis or explore reasons for heterogeneity.

For interest, the tau-squared = 0.7571 and the Q-statistic is < 0.0001 which also confirms high between-study heterogeniety.

We could have identified the heterogeneity by reviewing the overlap between confidence intervals in the forest plot, as we can see that there is a clear outlier - Juvela et al. 

How to explore and address heterogeneity

There are various strategies to explore and adddress heterogeneity.

  1. Check for any manual data entry errors. If data has been entered into the statistical software incorrectly, this can result in inaccurate results. By utilising a single excel file as the source of data, with no additional manual data entry, this risk is minimised.

  2. Check that a random-effects meta-analysis was performed. Clinical and methodological heterogeneity is always present in medical studies, and in our case a random-effects model has ben used.

  3. Avoid quantitative data synthesis and pooling studies. If clinical and methodological heterogeneity are substantial, then a quantitative data synthesis shuld not be peformed. A narrative or qualitative review should be performed instead.

  4. Choosing to pool the data, while exploring heterogeneity. Heterogeneity can be explored through metaregression to help identify study-level effect modifiers. This is where the outcome varies in different study-level clinical or methodological characteristics. Such metaregression analyses are best be pre-specified, and regardless they shoud be interpreted with caution and considered hypothesis generating.

  5. Exclude studies. 1 or 2 studies may be outliers or have high influence on the pooled effect size. In general, exclusion of studies should be avoided, since it may introduce bias. However, if there is an obvious reason for the outlying result, then the study can be excluded with confidence. Overall, it is often best to perform the analysis both with and without the outlier / influential study as part of a sensitivity analysis.

In our situation, the forest plot reveals an outlier with the Juvela et al study. To commence our investigation into heterogeneity, we will first consider outlier and influence analysis.

Outlier studies with extreme effect sizes can cause increased between study heterogenieity. This may arise from small or low quality studies. In addition, overly influential studies can push the pooled effect size higher or lower, which means that the pooled result is less robust since it relies on 1 or a small number of studies.

There are a number of ways to detect outliers and influential studies.

Outlier assessment

One way to identify statistical outliers is to see if the confidence intervals of one study do not overlap with the pooled effect size. This outlier has an extreme effect size, which means that it cannot be included in the total study population pooled to create the pooled effect size. This is because inclusion of the statistical outlier reduces the robustness of the pooled effect size result.

A common way to detect an outlier are to identify studies where

  1. The upper bound of the 95% CI is lower than the lower bound of the pooled effect 95% CI (i.e., extremely small effects)
  2. The lower bound of the 95% CI is higher than the upper bound of the pooled effect CI (i.e., extremely large effects)

The current results reveal that that the pooled rupture risk is 1.2378 [0.8149; 1.8760]. So lets identify studies that are outliers.

find.outliers(pes.summary.glmm.all)
## Identified outliers (fixed-effect model) 
## ---------------------------------------- 
## "Jeon 2014", "Juvela 2013" 
##  
## Results with outliers removed 
## ----------------------------- 
##                   events             95%-CI exclude
## Bor 2015          0.7444 [ 0.2535;  2.1655]        
## Broderick 2009    1.8519 [ 0.5093;  6.5019]        
## Burns 2009        0.5780 [ 0.1021;  3.2011]        
## Byoun 2016        1.7628 [ 0.9871;  3.1288]        
## Choi 2018         0.5780 [ 0.1021;  3.2011]        
## Gibbs 2004        0.0000 [ 0.0000; 14.8655]        
## Gondar 2016       0.8152 [ 0.2776;  2.3691]        
## Guresir 2013      0.7812 [ 0.2660;  2.2715]        
## Irazabal 2011     0.0000 [ 0.0000;  7.8652]        
## Jeon 2014         0.3521 [ 0.0966;  1.2746]       *
## Jiang 2013        0.0000 [ 0.0000;  7.1348]        
## Juvela 2013      18.6747 [13.4796; 25.2868]       *
## Loumiotis 2011    0.0000 [ 0.0000;  2.3446]        
## Matsubara 2004    0.0000 [ 0.0000;  2.3736]        
## Matsumoto 2013    2.6549 [ 0.9069;  7.5160]        
## Mizoi 1995        0.0000 [ 0.0000; 15.4639]        
## Morita 2012       1.8658 [ 1.4582;  2.3845]        
## Murayama 2016     2.2384 [ 1.6660;  3.0014]        
## Oh 2013           0.0000 [ 0.0000; 16.8179]        
## Serrone 2016      0.5155 [ 0.0911;  2.8615]        
## So 2010           1.1450 [ 0.3902;  3.3118]        
## Sonobe 2010       1.5625 [ 0.7589;  3.1897]        
## Teo 2016          2.3810 [ 0.8130;  6.7666]        
## Tsukahara 2005    3.4722 [ 1.4921;  7.8703]        
## Tsutsumi 2000     4.3478 [ 1.4896; 12.0212]        
## Villablanca 2013  1.5544 [ 0.5300;  4.4697]        
## Wiebers-R 1998    1.3503 [ 0.8558;  2.1244]        
## Wiebers-P 2003    1.0204 [ 0.6193;  1.6768]        
## Wilkinson 2018    0.0000 [ 0.0000; 14.8655]        
## Zylkowski 2015    2.7273 [ 0.9318;  7.7131]        
## 
## Number of studies combined: k = 28
## 
##                      events           95%-CI
## Fixed effect model   1.5519 [1.3490; 1.7848]
## Random effects model 1.3764 [1.0807; 1.7515]
## 
## Quantifying heterogeneity:
##  tau^2 = 0.0719; tau = 0.2682; I^2 = 30.0%; H = 1.20
## 
## Test of heterogeneity:
##      Q d.f. p-value             Test
##  25.58   27  0.5420        Wald-type
##  43.97   27  0.0208 Likelihood-Ratio
## 
## Details on meta-analytical method:
## - Random intercept logistic regression model
## - Maximum-likelihood estimator for tau^2
## - Hartung-Knapp adjustment for random effects model
## - Logit transformation
## - Wilson Score confidence interval for individual studies
## - Continuity correction of 0.5 in studies with zero cell frequencies
##   (only used to calculate individual study results)
## - Events per 100 observations
## 
## Identified outliers (random-effects model) 
## ------------------------------------------ 
## "Juvela 2013" 
##  
## Results with outliers removed 
## ----------------------------- 
##                   events             95%-CI exclude
## Bor 2015          0.7444 [ 0.2535;  2.1655]        
## Broderick 2009    1.8519 [ 0.5093;  6.5019]        
## Burns 2009        0.5780 [ 0.1021;  3.2011]        
## Byoun 2016        1.7628 [ 0.9871;  3.1288]        
## Choi 2018         0.5780 [ 0.1021;  3.2011]        
## Gibbs 2004        0.0000 [ 0.0000; 14.8655]        
## Gondar 2016       0.8152 [ 0.2776;  2.3691]        
## Guresir 2013      0.7812 [ 0.2660;  2.2715]        
## Irazabal 2011     0.0000 [ 0.0000;  7.8652]        
## Jeon 2014         0.3521 [ 0.0966;  1.2746]        
## Jiang 2013        0.0000 [ 0.0000;  7.1348]        
## Juvela 2013      18.6747 [13.4796; 25.2868]       *
## Loumiotis 2011    0.0000 [ 0.0000;  2.3446]        
## Matsubara 2004    0.0000 [ 0.0000;  2.3736]        
## Matsumoto 2013    2.6549 [ 0.9069;  7.5160]        
## Mizoi 1995        0.0000 [ 0.0000; 15.4639]        
## Morita 2012       1.8658 [ 1.4582;  2.3845]        
## Murayama 2016     2.2384 [ 1.6660;  3.0014]        
## Oh 2013           0.0000 [ 0.0000; 16.8179]        
## Serrone 2016      0.5155 [ 0.0911;  2.8615]        
## So 2010           1.1450 [ 0.3902;  3.3118]        
## Sonobe 2010       1.5625 [ 0.7589;  3.1897]        
## Teo 2016          2.3810 [ 0.8130;  6.7666]        
## Tsukahara 2005    3.4722 [ 1.4921;  7.8703]        
## Tsutsumi 2000     4.3478 [ 1.4896; 12.0212]        
## Villablanca 2013  1.5544 [ 0.5300;  4.4697]        
## Wiebers-R 1998    1.3503 [ 0.8558;  2.1244]        
## Wiebers-P 2003    1.0204 [ 0.6193;  1.6768]        
## Wilkinson 2018    0.0000 [ 0.0000; 14.8655]        
## Zylkowski 2015    2.7273 [ 0.9318;  7.7131]        
## 
## Number of studies combined: k = 29
## 
##                      events           95%-CI
## Fixed effect model   1.4995 [1.3044; 1.7234]
## Random effects model 1.2681 [0.9741; 1.6495]
## 
## Quantifying heterogeneity:
##  tau^2 = 0.1209; tau = 0.3476; I^2 = 41.3%; H = 1.31
## 
## Test of heterogeneity:
##      Q d.f. p-value             Test
##  30.68   28  0.3313        Wald-type
##  51.52   28  0.0044 Likelihood-Ratio
## 
## Details on meta-analytical method:
## - Random intercept logistic regression model
## - Maximum-likelihood estimator for tau^2
## - Hartung-Knapp adjustment for random effects model
## - Logit transformation
## - Wilson Score confidence interval for individual studies
## - Continuity correction of 0.5 in studies with zero cell frequencies
##   (only used to calculate individual study results)
## - Events per 100 observations

Here using the random effects model, Juvela is identified as the outlier.

The pooled rupture risk changes with Juvela et al is 1.2378 [0.8149; 1.8760]

The pooled rupture risk without Juvela et al is 1.2681 [0.9741; 1.6495]

Although the pooled rupture risk is still between 1-2%, the precision of the estimate has improved with narrower confidence intervals, and the I2 has reduced from 83 to 41%, which makes the result more robust and more generalisable.

Influence analysis

Apart from excluding statistical outliers which reduce the robustness of the pooled result, it is important to identify studies in the meta-analysis that extert high influence on the pooled results. If 1 study very highly influences the pooled effect size result, this also makes the pooled effect size result less robust and less generalisable.

A common method to identify influential studies is the leave-1-out method, where the results of the meta-analysis are recalculated leaving out 1 study at a time. This helps identify individual studies that may exert very high influence on the result, pushing the pooled effect size higher or lower.

inf.analysis.all <- InfluenceAnalysis(x = pes.summary.glmm.all, 
                                  random = TRUE,
                                  subplot.heights = c(30,18),
                                  subplot.widths = c(30,30),
                                  forest.lims = c(0,0.03))
## [===========================================================================] DONE
summary(inf.analysis.all)
## Leave-One-Out Analysis (Sorted by I2) 
##  ----------------------------------- 
##                           Effect   LLCI   ULCI    I2
## Omitting Juvela 2013      -4.355 -4.622 -4.088 0.413
## Omitting Morita 2012      -4.416 -4.862 -3.969 0.802
## Omitting Murayama 2016    -4.423 -4.868 -3.979 0.811
## Omitting Jeon 2014        -4.316 -4.735 -3.896 0.825
## Omitting Loumiotis 2011   -4.323 -4.737 -3.910 0.827
## Omitting Matsubara 2004   -4.324 -4.737 -3.911 0.828
## Omitting Wiebers-R 1998   -4.396 -4.843 -3.949 0.832
## Omitting Wiebers-P 2003   -4.378 -4.823 -3.933 0.832
## Omitting Irazabal 2011    -4.357 -4.778 -3.936 0.835
## Omitting Jiang 2013       -4.355 -4.776 -3.934 0.835
## Omitting Serrone 2016     -4.349 -4.775 -3.922 0.835
## Omitting Burns 2009       -4.354 -4.781 -3.926 0.836
## Omitting Choi 2018        -4.354 -4.781 -3.926 0.836
## Omitting Gibbs 2004       -4.367 -4.789 -3.945 0.836
## Omitting Mizoi 1995       -4.368 -4.790 -3.945 0.836
## Omitting Oh 2013          -4.369 -4.791 -3.946 0.836
## Omitting Tsukahara 2005   -4.427 -4.865 -3.989 0.836
## Omitting Tsutsumi 2000    -4.424 -4.859 -3.989 0.836
## Omitting Wilkinson 2018   -4.367 -4.789 -3.945 0.836
## Omitting Bor 2015         -4.358 -4.792 -3.923 0.837
## Omitting Byoun 2016       -4.408 -4.854 -3.963 0.837
## Omitting Guresir 2013     -4.360 -4.796 -3.925 0.837
## Omitting Gondar 2016      -4.363 -4.799 -3.927 0.838
## Omitting Matsumoto 2013   -4.412 -4.851 -3.973 0.840
## Omitting So 2010          -4.380 -4.819 -3.941 0.840
## Omitting Sonobe 2010      -4.400 -4.845 -3.955 0.840
## Omitting Zylkowski 2015   -4.413 -4.852 -3.974 0.840
## Omitting Broderick 2009   -4.396 -4.834 -3.958 0.841
## Omitting Teo 2016         -4.409 -4.849 -3.969 0.841
## Omitting Villablanca 2013 -4.393 -4.834 -3.953 0.841
## 
## 
## Influence Diagnostics 
##  ------------------- 
##                           rstudent dffits cook.d cov.r  QE.del   hat weight
## Omitting Bor 2015           -0.859 -0.170  0.029 1.040 180.056 0.038  3.791
## Omitting Broderick 2009      0.119  0.039  0.002 1.059 183.807 0.032  3.183
## Omitting Burns 2009         -0.852 -0.128  0.016 1.022 181.983 0.022  2.181
## Omitting Byoun 2016          0.091  0.048  0.002 1.097 183.153 0.052  5.179
## Omitting Choi 2018          -0.852 -0.128  0.016 1.022 181.983 0.022  2.181
## Omitting Gibbs 2004          0.177  0.028  0.001 1.024 183.889 0.013  1.315
## Omitting Gondar 2016        -0.758 -0.147  0.022 1.045 180.656 0.038  3.790
## Omitting Guresir 2013       -0.805 -0.158  0.025 1.043 180.381 0.038  3.791
## Omitting Irazabal 2011      -0.262 -0.025  0.001 1.022 183.615 0.013  1.327
## Omitting Jeon 2014          -1.558 -0.312  0.094 0.988 176.785 0.032  3.205
## Omitting Jiang 2013         -0.327 -0.033  0.001 1.021 183.532 0.013  1.328
## Omitting Juvela 2013         8.682  0.665  0.061 0.290  34.025 0.056  5.627
## Omitting Loumiotis 2011     -1.056 -0.126  0.016 1.007 181.880 0.013  1.335
## Omitting Matsubara 2004     -1.048 -0.125  0.016 1.008 181.905 0.013  1.335
## Omitting Matsumoto 2013      0.518  0.120  0.015 1.066 183.810 0.038  3.764
## Omitting Mizoi 1995          0.205  0.031  0.001 1.024 183.890 0.013  1.314
## Omitting Morita 2012         0.173  0.072  0.006 1.109 180.636 0.058  5.849
## Omitting Murayama 2016       0.413  0.127  0.017 1.104 183.884 0.058  5.777
## Omitting Oh 2013             0.265  0.038  0.001 1.024 183.885 0.013  1.312
## Omitting Serrone 2016       -0.946 -0.144  0.021 1.018 181.651 0.022  2.182
## Omitting So 2010            -0.387 -0.062  0.004 1.062 182.449 0.038  3.786
## Omitting Sonobe 2010        -0.058  0.011  0.000 1.088 182.877 0.048  4.800
## Omitting Teo 2016            0.400  0.098  0.010 1.068 183.882 0.038  3.768
## Omitting Tsukahara 2005      0.881  0.200  0.041 1.062 182.931 0.044  4.417
## Omitting Tsutsumi 2000       1.059  0.213  0.045 1.044 182.566 0.037  3.739
## Omitting Villablanca 2013   -0.058  0.008  0.000 1.069 183.452 0.038  3.780
## Omitting Wiebers-R 1998     -0.249 -0.035  0.001 1.096 178.560 0.055  5.478
## Omitting Wiebers-P 2003     -0.612 -0.132  0.018 1.077 173.475 0.054  5.384
## Omitting Wilkinson 2018      0.177  0.028  0.001 1.024 183.889 0.013  1.315
## Omitting Zylkowski 2015      0.547  0.125  0.016 1.065 183.781 0.038  3.763
##                           infl
## Omitting Bor 2015             
## Omitting Broderick 2009       
## Omitting Burns 2009           
## Omitting Byoun 2016           
## Omitting Choi 2018            
## Omitting Gibbs 2004           
## Omitting Gondar 2016          
## Omitting Guresir 2013         
## Omitting Irazabal 2011        
## Omitting Jeon 2014            
## Omitting Jiang 2013           
## Omitting Juvela 2013         *
## Omitting Loumiotis 2011       
## Omitting Matsubara 2004       
## Omitting Matsumoto 2013       
## Omitting Mizoi 1995           
## Omitting Morita 2012          
## Omitting Murayama 2016        
## Omitting Oh 2013              
## Omitting Serrone 2016         
## Omitting So 2010              
## Omitting Sonobe 2010          
## Omitting Teo 2016             
## Omitting Tsukahara 2005       
## Omitting Tsutsumi 2000        
## Omitting Villablanca 2013     
## Omitting Wiebers-R 1998       
## Omitting Wiebers-P 2003       
## Omitting Wilkinson 2018       
## Omitting Zylkowski 2015       
## 
## 
## Baujat Diagnostics (sorted by Heterogeneity Contribution) 
##  ------------------------------------------------------- 
##                           HetContrib InfluenceEffectSize
## Omitting Juvela 2013         132.737              33.728
## Omitting Wiebers-P 2003        9.713              11.097
## Omitting Jeon 2014             7.041              13.333
## Omitting Wiebers-R 1998        4.901              13.587
## Omitting Bor 2015              3.782              15.267
## Omitting Guresir 2013          3.461              15.443
## Omitting Gondar 2016           3.190              15.591
## Omitting Morita 2012           2.356              15.739
## Omitting Serrone 2016          2.229              16.286
## Omitting Loumiotis 2011        2.005              16.136
## Omitting Matsubara 2004        1.980              16.154
## Omitting Burns 2009            1.898              16.485
## Omitting Choi 2018             1.898              16.485
## Omitting So 2010               1.422              16.590
## Omitting Tsutsumi 2000         1.307              18.466
## Omitting Sonobe 2010           0.981              16.729
## Omitting Tsukahara 2005        0.938              18.723
## Omitting Byoun 2016            0.701              16.969
## Omitting Villablanca 2013      0.432              17.252
## Omitting Jiang 2013            0.357              17.180
## Omitting Irazabal 2011         0.274              17.228
## Omitting Zylkowski 2015        0.107              18.061
## Omitting Broderick 2009        0.082              17.590
## Omitting Matsumoto 2013        0.079              18.032
## Omitting Teo 2016              0.008              17.904
## Omitting Murayama 2016         0.005              20.358
## Omitting Oh 2013               0.005              17.479
## Omitting Gibbs 2004            0.001              17.450
## Omitting Wilkinson 2018        0.001              17.450
## Omitting Mizoi 1995            0.000              17.460
plot(inf.analysis.all, "baujat")

The Baujat Plot (Baujat et al. 2002) is a diagnostic plot to detect studies overly contributing to the heterogeneity. The study contribution to Cochran’s Q is plotted on the horizontal axis, and the study influence on the pooled effect size on the vertical axis. All studies on the right side of the plot are of interest, since the contribute to heterogeniety.

Here Juvela et al is identified as both a high contributor to heterogeneity, and high influence on the pooled result.

plot(inf.analysis.all, "influence")

We can also perform additional plots to assess influence of each study. These plots proposed by Viechtbauer & Cheung (2010) also demonstrate that the Juvela study is not only a statistical outlier, but also an influential study. This is important because this further corroborates that the Juvela study adds statistical heterogeneity, and influences the overall pooled result.

plot(inf.analysis.all, "i2")

In this plot, we can see the impact of the leave 1 out analysis on the pooled effect size. This is ordered by the reduction in statistical heterogeneity as measured by I2.

We can again see that the precision of the pooled effect size estimate improves with exlcusion of Juvela et al, and that the I2 reduces the most with exclusion of Juvela et al. 

These findings corroborate the findings of the outlier analysis, the Baujat plots, and influence analysis proposed by Viechtbauer and Cheung that Juvela is the main source of heterogeniety.

Why Juvela et al excluded.

The outlier and influence analysis are concordant that Juvela et al is an outlier which influences the pooled effect estimate and reduces the precision of the pooled effect estimate. This is on the basis of statistical heterogeneity, and that the rupture proportion observed in the Juvela et al study varies greater than what is expected due to random chance alone.

The analogy is that while the remaining studies are different varieties of apples, that Juvela study is an orange, and that combining this study with all other studies will yield much less meaningful result.

Since statistical heterogeneity is a consequence of clinical and/or methodological heterogeniety, we can explore if there are particular clinical or methodogical factors that are likely to be responsible.

Clinical and methodological factors likely to explain the statistical heterogeneity are are proportion of patients with exposure to prior subarachnoid haemorrhage, patient enrollment period, and length of follow up.

Meta-analysis of proportions using GLMM and Wilson CIs with Juvela excluded

Since it is reasonable to exclude Juvela, we can re-calculate the main study result.

dat10 <- dat %>%
  slice(-12) 

pes.summary.glmm = metaprop(rupt, total_size,
                            data=dat10,
                            studlab=paste(auth_year),
                            method="GLMM",
                            sm="PLOGIT",
                            method.tau = "ML", 
                            method.ci = "WS",
                            hakn = TRUE,
                            pscale = 100
                            ) 
pes.summary.glmm
##                  events            95%-CI
## Bor 2015         0.7444 [0.2535;  2.1655]
## Broderick 2009   1.8519 [0.5093;  6.5019]
## Burns 2009       0.5780 [0.1021;  3.2011]
## Byoun 2016       1.7628 [0.9871;  3.1288]
## Choi 2018        0.5780 [0.1021;  3.2011]
## Gibbs 2004       0.0000 [0.0000; 14.8655]
## Gondar 2016      0.8152 [0.2776;  2.3691]
## Guresir 2013     0.7812 [0.2660;  2.2715]
## Irazabal 2011    0.0000 [0.0000;  7.8652]
## Jeon 2014        0.3521 [0.0966;  1.2746]
## Jiang 2013       0.0000 [0.0000;  7.1348]
## Loumiotis 2011   0.0000 [0.0000;  2.3446]
## Matsubara 2004   0.0000 [0.0000;  2.3736]
## Matsumoto 2013   2.6549 [0.9069;  7.5160]
## Mizoi 1995       0.0000 [0.0000; 15.4639]
## Morita 2012      1.8658 [1.4582;  2.3845]
## Murayama 2016    2.2384 [1.6660;  3.0014]
## Oh 2013          0.0000 [0.0000; 16.8179]
## Serrone 2016     0.5155 [0.0911;  2.8615]
## So 2010          1.1450 [0.3902;  3.3118]
## Sonobe 2010      1.5625 [0.7589;  3.1897]
## Teo 2016         2.3810 [0.8130;  6.7666]
## Tsukahara 2005   3.4722 [1.4921;  7.8703]
## Tsutsumi 2000    4.3478 [1.4896; 12.0212]
## Villablanca 2013 1.5544 [0.5300;  4.4697]
## Wiebers-R 1998   1.3503 [0.8558;  2.1244]
## Wiebers-P 2003   1.0204 [0.6193;  1.6768]
## Wilkinson 2018   0.0000 [0.0000; 14.8655]
## Zylkowski 2015   2.7273 [0.9318;  7.7131]
## 
## Number of studies combined: k = 29
## 
##                      events           95%-CI
## Fixed effect model   1.4995 [1.3044; 1.7234]
## Random effects model 1.2681 [0.9741; 1.6495]
## 
## Quantifying heterogeneity:
##  tau^2 = 0.1209; tau = 0.3476; I^2 = 41.3%; H = 1.31
## 
## Test of heterogeneity:
##      Q d.f. p-value             Test
##  30.68   28  0.3313        Wald-type
##  51.52   28  0.0044 Likelihood-Ratio
## 
## Details on meta-analytical method:
## - Random intercept logistic regression model
## - Maximum-likelihood estimator for tau^2
## - Hartung-Knapp adjustment for random effects model
## - Logit transformation
## - Wilson Score confidence interval for individual studies
## - Continuity correction of 0.5 in studies with zero cell frequencies
##   (only used to calculate individual study results)
## - Events per 100 observations
dat10 %>%
  drop_na(fu) %>%
  summarise(mean = mean(fu, na.rm = TRUE) / 12)
## # A tibble: 1 x 1
##    mean
##   <dbl>
## 1  3.72

The output from the random effects meta-analysis using the GLMM:

Summary estimate is 1.2681 [0.9741; 1.6495] with I2 of 41.3% over study level mean of 3.7 years.

Display meta-analysis result using a publication quality forest plot

The easiest way to communicate the result of the statistical procedure is via a forest plot.

forest(pes.summary.glmm,
       layout = "meta",
       comb.fixed = FALSE,
       comb.random = TRUE,
       print.I2.ci = TRUE,
       colgap = "7mm",
       leftlabs = c("Study", "Ruptures", "Total"),
       rightcols = c("effect", "ci"),
       rightlabs = c("Ruptures per 100", "95% CI"),
       smlab = " ",
       xlim=c(0,10),
       xlab = "Rupture per 100 aneurysms",
       pooled.events = TRUE,
       text.addline1 = "Mean study-level follow up: 3.7 years",
       JAMA.pval = TRUE,
       ) 

Exploring residual heterogeneity.

Now we can explore residual heterogeniety through meta-regression.

Meta-regression allows the effects of multiple continuous and categorical variables to be investigated simultaneously, unlike subgroup analysis which considers one categorical variable only. This generally utilises a random effect model to conduct the analysis in each subgroup, and then considers additional statistical testing to compare the pooled results across the subgroups. Meta-regression should generally not be considered when there are less than ten studies in a meta-analysis. The meta-regression cooeffient obtained describes how the outcome (dependent variable) changes withn unit increase in the potential effect modifier. If the outcome is a ratio measure, the log-transformed value should be used in the regression model. Cochrane Handbook 10.11.4

The statistical significance of the regression coefficient is a test of whether there is a linear relationship between outcome variable and the potential effect modifier variable. The association that is identified with one trial characteristic may in reality reflect a true association with other correlated characteristics, whether these are known or unknown. It is not be possible to ajust for all potential confounders and thus they can can lead to misleading conclusions. Even if there are a large number of covariates adjusted for, we cannot be certain that all potential confounders have been identified. These analyses cannot prove causality, and at best, they can be considered hypothesis generating.

This is an important concept, because if metaregression findings are presented as definitive conclusions there is risk of people being denied an effective intervention or treated with an ineffective or harmful intervention. This can also generate misleading recommendations about future research directions, which if followed would waste scarce research resources.

When reporting results from meta-regression additional important factors should be considered:

  1. Wheether a statistically significant assocaition was detected
  2. The covariate distribution (i.e. the number of trials and participants contributing to each subgroup)
  3. The plausibility of the association or lack of association
  4. The importance of the association or lack of association
  5. The possibility of confounding

Moving forward, we shall choose to pool the data, explore heterogeneity in the synthesised data with meta-regression, and accurately report the interpretation of the meta-regression with clinical context.

Meta-regression

The purpose of meta-regression is to explore whether particular covariates explain the heterogeneity of effects between the studies included.

Meta-regression can be performed with a categorical trial covariate which yeilds subgroup analysis. The important focus is on the test for differences across the subgroups, rather than the effect size in each sub-group. This is because the goal is to explore heterogeneity. Thus residual heterogeneity should be taken into account, and a random-effects meta-regression performed. The amount of residual heterogeneity should also be reported.

Meta-regression is also performed with continuous trial covariates, which yeilds the typcial meta-regression plots. Visual presentation is helpful, ideally using symbol sizes that refelect the precision of the study level effect estimate.

Many methods for meta-regression exist to accommodate the varied manners in which data is presented (i.e. study-level summary counts for the cells of 2×2 tables, study-level relative proportions, and the assumed nature of the variability observed across studies (fixed effects vs. random effects meta-regression).

In our case, the most robust data is the study-level relative proportions, and while study-level summary counts for the cells of 2x2 tables could be calculated, this is likely to introduce additional bias due to lack of individual patient level data, sparse data bias, and bias introduced due to assumptions about multiplicity of aneuryms.

The method of meta-regression will be random effects meta-regression since there is expected to be residual between-study variability due to clinical and methodological variability across the studies.

To reduce the risk of false positives, and to ensure that there are adequate studies to consider meta-regression, only one potential covariate is considered for metaregression each time. Multivariable meta-regression could be considered with up to 2 covariates for covariates with a p value of less than 0.1, and with lower residual heterogeniety than the overall pooled analysis, however this is typically not done when using aggredate data due to increased risk of false positive results.

Regardless, the outcomes of these meta-regression analyses are limited. They are entirely observational and susceptible to confounding bias from other study level characteristics. Importantly, they are not be generalisable since there is aggregation bias due to study-level data being used for analysis and not patient level data. This aggregation bias may fail to detect important associatons that may be detected utilising individual patient-data.

In our study, we have pre-specified further data analysis with regard to exposure to prior subarachnoid haemorrhage and aneurysm size in our published study protocol. Thus the following meta-regression analyses will be performed.

  1. Meta-regression of proportion of patients included with exposure to prior aSAH as a continuous variable on rupture risk estimate per 100 SUIAs ≤10mm managed with follow up.

  2. Meta-regression of proportion of patients included with ≤5mm aneurysms as a continuous variable on rupture risk estimate per 100 SUIAs ≤10mm managed with follow up.

  3. Meta-regression of proportion of patients included with multiple aneurysms as a continuous variable on rupture risk estimate per 100 SUIAs ≤10mm managed with follow up.

  4. Meta-regression of age as a continuous variable on rupture risk estimate per 100 SUIAs ≤10mm managed with follow up.

  5. Sub-group analysis of pooled rupture risk estimate per 100 SUIAs ≤10mm managed with follow up with mean age categorised as ≤65-years or >65-years.

  6. Sub-group analysis of pooled rupture risk estimate per 100 SUIAs ≤10mm managed with follow up with source population categorised as Japanese or non-Japanese.

  7. Sub-group analysis of study type per 100 SUIAs ≤10mm managed with follow up categorised as prospective or retrospective.

  8. Meta-regression of follow up time as a continuous variable on rupture risk estimate per 100 SUIAs ≤10mm.

Metagression for prior SAH exposure

We can analyse whether there is a relationship between the proportion of patients with prior SAH included in the study (the exploratory variable / potential effect modifier) and the the outcome (rupture proportion).

The choice of this covariate is:

  • Pre-specified
  • Biologically plausible
  • Supported by published observational data
  • Relevant to physicians
dat.psah.metareg <- dat %>%
  slice(-12) %>%
  unite(auth_year, c(auth, pub), sep = " ", remove = FALSE) %>%
  select(auth_year, rupt, total_size, prop_psah) %>%
  drop_na(total_size) %>%
  drop_na(prop_psah)
psah.metareg = metaprop(rupt, total_size,
                      data=dat.psah.metareg,
                      studlab=paste(auth_year),
                      method="GLMM",
                      sm="PLOGIT",
                      method.tau = "ML", 
                      method.ci = "WS",
                      hakn = TRUE,
                      pscale = 100
                      )

psah.metareg.result <- metareg(psah.metareg, ~prop_psah, hakn = TRUE)

psah.metareg.result
## 
## Mixed-Effects Model (k = 27; tau^2 estimator: ML)
## 
## tau^2 (estimated amount of residual heterogeneity):     0.1848
## tau (square root of estimated tau^2 value):             0.4299
## I^2 (residual heterogeneity / unaccounted variability): 48.0950%
## H^2 (unaccounted variability / sampling variability):   1.9266
## 
## Tests for Residual Heterogeneity:
## Wld(df = 25) = 26.8283, p-val = 0.3645
## LRT(df = 25) = 49.7222, p-val = 0.0023
## 
## Test of Moderators (coefficient 2):
## F(df1 = 1, df2 = 25) = 0.3559, p-val = 0.5562
## 
## Model Results:
## 
##            estimate      se      tval    pval    ci.lb    ci.ub 
## intrcpt     -4.5062  0.2344  -19.2266  <.0001  -4.9889  -4.0235  *** 
## prop_psah    0.5803  0.9728    0.5966  0.5562  -1.4232   2.5839      
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
range(dat.psah.metareg$prop_psah) * 100
## [1]  0.00000 49.82747
mean(dat.psah.metareg$prop_psah) * 100
## [1] 11.33309
sum(dat.psah.metareg$rupt)
## [1] 181
sum(dat.psah.metareg$total_size)
## [1] 12187
bubble(psah.metareg.result,
       xlim=c(0, 0.5),
       xlab = "Proportion of patients with exposure to prior aSAH",
       ylab = "Rupture risk estimate logit tranformed proportion",
       lty = 3,
       lwd = 2,
       studlab=dat.psah.metareg$auth_year, 
       cex.studlab = 0.75
       )

Interpretation:

The random effects metaregression included participants from 27 studies with a mean study level proportion of exposure to prior subarachnoid haemorrhage of 11.3% (range 0 - 49.8%). There were a total of 181 ruptures and 12 187 aneurysms in studies included in this metaregression analysis. The metaregression result was not significant (F[1,25)=.3559,P=.5562), suggesting that the proportion of patients with prior SAH included in the study was not associated with the risk of rupture at the study level. However, there is moderate unexplained residual heterogeneity (I2 = 48.1%), which is higher than across all included studies, indicating that this meta-regression analysis was not informative in identifying sources of heterogeneity in the meta-analysis result. In addition, this random effects metaregression analysis is susceptible to confounding bias from other study level characteristics, and is not generalisable to an individual patient, since there is aggregation bias due to study-level data being used for analysis.

Metaregression to explore size of 5mm UIAs included

We can analyse whether there is a relationship between the proportion of patients with aneurysms measuring 5mm and less included in the study (the exploratory varable ), and the the outcome (rupture proportion).

The choice of this covariate is:

  • Pre-specified
  • Biologically plausible
  • Supported by published observational data
  • Relevant to physicians

Since we are analysing at a study level, and we do not know the aneurysm characteristics of individual patients, this is an exploratory and hypothesis generating analysis.

sizedata5 <- filter(maindata, size == 5) 
dat5 <- sizedata5 %>%
  mutate(prop_multi = multi_tot / num_tot,
         num_multi = prop_multi * num + num,
         num_multi_temp = coalesce(num_anr, num_multi),
         total_size_temp = coalesce(num_anr, num),
         total_size_temp_2 = coalesce(num_multi_temp, total_size_temp),
         total_size = round(total_size_temp_2, 0),
         psah_size_temp = psah * prop_multi + psah,
         prop_psah = psah_tot / num_tot,
         num_anr_psah = prop_psah * total_size,
         size_psah_temp = coalesce(psah_size_temp, num_anr_psah),
         psah_size = round(size_psah_temp, 0),
         ) %>%
  mutate(fu = coalesce(fu_mean_tot,fu_med_tot)) %>% 
  unite(auth_year, c(auth, pub), sep = " ", remove = FALSE) %>%
  mutate(pop = fct_collapse(sizedata5$country, 
                            "Japanese" = "Japan", 
                            "Non-Japanese" = c("International", "United States", "Switzerland", "Australia", "Korea", "Poland", "China", "Germany", "United Kingdom", "Finland"))
         )
dat10.metareg <- dat %>%
  slice(-12) %>%
  rename(total_size10 = total_size) %>%
  rename(rupt10 = rupt) %>%
  select(auth_year, rupt10, total_size10)

dat5.metareg <- dat5 %>%
  slice(-12) %>%
  rename(total_size5 = total_size) %>%
  rename(rupt5 = rupt) %>%
  select(auth_year, rupt5, total_size5)

dat.size5.metareg <- left_join(dat5.metareg, dat10.metareg) %>%
  mutate(prop_size5 = total_size5 / total_size10) %>%
  select(auth_year, rupt10, total_size10, prop_size5) %>%
  drop_na(prop_size5)
## Joining, by = "auth_year"
size5.metareg = metaprop(rupt10, total_size10,
                      data=dat.size5.metareg,
                      studlab=paste(auth_year),
                      method="GLMM",
                      sm="PLOGIT",
                      method.tau = "ML", 
                      method.ci = "WS",
                      hakn = TRUE,
                      pscale = 100
                      )

size5mm.metareg.result <- metareg(size5.metareg, ~prop_size5, hakn = TRUE)

size5mm.metareg.result
## 
## Mixed-Effects Model (k = 24; tau^2 estimator: ML)
## 
## tau^2 (estimated amount of residual heterogeneity):     0.1286
## tau (square root of estimated tau^2 value):             0.3585
## I^2 (residual heterogeneity / unaccounted variability): 37.9463%
## H^2 (unaccounted variability / sampling variability):   1.6115
## 
## Tests for Residual Heterogeneity:
## Wld(df = 22) = 24.3599, p-val = 0.3287
## LRT(df = 22) = 41.6562, p-val = 0.0069
## 
## Test of Moderators (coefficient 2):
## F(df1 = 1, df2 = 22) = 0.0271, p-val = 0.8707
## 
## Model Results:
## 
##             estimate      se     tval    pval    ci.lb    ci.ub 
## intrcpt      -4.4176  0.6657  -6.6357  <.0001  -5.7982  -3.0369  *** 
## prop_size5    0.1364  0.8283   0.1647  0.8707  -1.5814   1.8542      
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
range(dat.size5.metareg$prop_size5) * 100
## [1]  24.74227 100.00000
mean(dat.size5.metareg$prop_size5) * 100
## [1] 76.73934
sum(dat.size5.metareg$rupt10)
## [1] 173
sum(dat.size5.metareg$total_size10)
## [1] 10882
bubble(size5mm.metareg.result, 
       xlim=c(0, 1),
       xlab = "Proportion of patients with ≤5mm SIUIAs",
       ylab = "Rupture risk estimate logit tranformed proportion",
       lty = 3,
       lwd = 2,
       studlab=dat.size5.metareg$auth_year, 
       cex.studlab = 0.75
       )

Interpretation:

The random effects metaregression included participants from 24 studies with a mean study level proportion of 5mm aneurysms of 77% (range 25 - 100%). There were a total of 173 ruptures and 10 882 aneurysms in studies included in this metaregression analysis. The metaregression result was not significant (F(1,22) = 0.0271,P= 0.8707), suggesting that the proportion of patients with 5mm and less aneurysms included in the study was not associated with the risk of rupture at the study level. However, there is moderate unexplained residual heterogeneity (I2 = 38%), which similar to all included studies, indicating that this meta-regression analysis was not informative in identifying sources of heterogeneity in the meta-analysis result. In addition, this random effects metaregression analysis is susceptible to confounding bias from other study level characteristics, and is not generalisable to an individual patient, since there is aggregation bias due to study-level data being used for analysis.

Metaregression to explore proportion with multiple aneurysms

dat.multi.metareg <- dat %>%
  slice(-12) %>%
  unite(auth_year, c(auth, pub), sep = " ", remove = FALSE) %>%
  select(auth_year, rupt, total_size, prop_multi) %>%
  drop_na(total_size) %>%
  drop_na(prop_multi)
multi.metareg = metaprop(rupt, total_size,
                      data=dat.multi.metareg,
                      studlab=paste(auth_year),
                      method="GLMM",
                      sm="PLOGIT",
                      method.tau = "ML", 
                      method.ci = "WS",
                      hakn = TRUE,
                      pscale = 100
                      )

multi.metareg.result <- metareg(multi.metareg, ~prop_multi, hakn = TRUE)

multi.metareg.result
## 
## Mixed-Effects Model (k = 19; tau^2 estimator: ML)
## 
## tau^2 (estimated amount of residual heterogeneity):     0.0991
## tau (square root of estimated tau^2 value):             0.3149
## I^2 (residual heterogeneity / unaccounted variability): 36.6303%
## H^2 (unaccounted variability / sampling variability):   1.5780
## 
## Tests for Residual Heterogeneity:
## Wld(df = 17) = 20.8124, p-val = 0.2348
## LRT(df = 17) = 35.4303, p-val = 0.0055
## 
## Test of Moderators (coefficient 2):
## F(df1 = 1, df2 = 17) = 0.0006, p-val = 0.9813
## 
## Model Results:
## 
##             estimate      se     tval    pval    ci.lb    ci.ub 
## intrcpt      -4.3745  0.5050  -8.6618  <.0001  -5.4400  -3.3090  *** 
## prop_multi    0.0337  1.4205   0.0237  0.9813  -2.9634   3.0308      
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
range(dat.multi.metareg$prop_multi) * 100
## [1]  0.00000 51.71103
mean(dat.multi.metareg$prop_multi) * 100
## [1] 28.7982
sum(dat.multi.metareg$rupt)
## [1] 168
sum(dat.multi.metareg$total_size)
## [1] 11058
bubble(multi.metareg.result, 
       xlim=c(0, 1.0),
       xlab = "Proportion of patients with multiple SIUIAs",
       ylab = "Rupture risk estimate logit tranformed proportion",
       lty = 3,
       lwd = 2,
       studlab=dat.multi.metareg$auth_year, 
       cex.studlab = 0.75
       )

Interpretation:

The random effects metaregression included participants from 19 studies with a mean study level proportion of aneurysm multiplicity of 28.8% (range 0 - 51.7%). There were a total of 168 ruptures and 11 058 aneurysms in studies included in this metaregression analysis. The metaregression result was not significant F(df1 = 1, df2 = 17) = 0.0006, p-val = 0.9813, suggesting that the proportion of patients with multiple aneurysms included in the study was not associated with the risk of rupture at the study level. However, there is moderate unexplained residual heterogeneity (I2 = 36.6%), which similar to all included studies, indicating that this meta-regression analysis was not informative in identifying sources of heterogeneity in the meta-analysis result. In addition, this random effects metaregression analysis is susceptible to confounding bias from other study level characteristics, and is not generalisable to an individual patient, since there is aggregation bias due to study-level data being used for analysis.

Meta-regression based on mean age - as continuous variable

dat.age.metareg <- dat %>%
  slice(-12) %>%
  mutate(age = coalesce(age_mean, age_med)) %>%
  drop_na(age) %>%
  select(auth_year, rupt, total_size, age)
age.metareg = metaprop(rupt, total_size,
                      data=dat.age.metareg,
                      studlab=paste(auth_year),
                      method="GLMM",
                      sm="PLOGIT",
                      method.tau = "ML", 
                      method.ci = "WS",
                      hakn = TRUE,
                      pscale = 100
                      )

age.metareg.result <- metareg(age.metareg, ~age, hakn = TRUE)

age.metareg.result
## 
## Mixed-Effects Model (k = 28; tau^2 estimator: ML)
## 
## tau^2 (estimated amount of residual heterogeneity):     0
## tau (square root of estimated tau^2 value):             0
## I^2 (residual heterogeneity / unaccounted variability): 0.0000%
## H^2 (unaccounted variability / sampling variability):   1.0000
## 
## Tests for Residual Heterogeneity:
## Wld(df = 26) = 16.4840, p-val = 0.9238
## LRT(df = 26) = 34.9424, p-val = 0.1129
## 
## Test of Moderators (coefficient 2):
## F(df1 = 1, df2 = 26) = 12.9125, p-val = 0.0013
## 
## Model Results:
## 
##          estimate      se     tval    pval    ci.lb    ci.ub 
## intrcpt   -7.4817  0.9261  -8.0786  <.0001  -9.3853  -5.5780  *** 
## age        0.0537  0.0149   3.5934  0.0013   0.0230   0.0844   ** 
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
range(dat.age.metareg$age) 
## [1] 47.9 72.2
mean(dat.age.metareg$age)
## [1] 58.62536
sd(dat.age.metareg$age)
## [1] 6.417667
sum(dat.age.metareg$rupt)
## [1] 190
sum(dat.age.metareg$total_size)
## [1] 12860
bubble(age.metareg.result, 
       xlim=c(40, 80), 
       xlab = "Mean study age",
       ylab = "Rupture risk estimate logit tranformed proportion",
       lty = 3,
       lwd = 2,
       studlab=dat.age.metareg$auth_year,
       cex.studlab = 0.75,
       )

The random effects metaregression included participants from 28 studies with a mean study level patient age of 58.6 +/- 6.4 years (range 47.9 - 72.4 years). There were a total of 190 ruptures and 12 860 aneurysms in studies included in this metaregression analysis. The metaregression result was significant F(df1 = 1, df2 = 26) = 12.9125, p-val = 0.0013, suggesting that mean age was associated with the risk of rupture at the study level. There was no unexplained residual heterogeneity (I2 = 0%) indicating that this meta-regression analysis was informative in identifying sources of heterogeneity in the meta-analysis result. However, this random effects metaregression analysis is susceptible to confounding bias from other study level characteristics, and is not generalisable to an individual patient, since there is aggregation bias due to study-level data being used for analysis.

Sub-group analysis based on study age - Age <≤65vs >65 years

dat.age.cat <- dat %>%
  slice(-12) %>%
  drop_na(age_cat)

pes.summary.glmm.age = metaprop(rupt, total_size,
                            data=dat.age.cat,
                            studlab=paste(auth_year),
                            byvar = age_cat,
                            bylab = "Age",
                            method="GLMM",
                            sm="PLOGIT",
                            method.tau = "ML", 
                            method.ci = "WS",
                            hakn = TRUE,
                            pscale = 100
                            )
                            
pes.summary.glmm.age
##                  events            95%-CI       Age
## Bor 2015         0.7444 [0.2535;  2.1655] ≤65-years
## Broderick 2009   1.8519 [0.5093;  6.5019] ≤65-years
## Burns 2009       0.5780 [0.1021;  3.2011] ≤65-years
## Byoun 2016       1.7628 [0.9871;  3.1288] ≤65-years
## Choi 2018        0.5780 [0.1021;  3.2011] ≤65-years
## Gibbs 2004       0.0000 [0.0000; 14.8655] ≤65-years
## Gondar 2016      0.8152 [0.2776;  2.3691] ≤65-years
## Guresir 2013     0.7812 [0.2660;  2.2715] ≤65-years
## Irazabal 2011    0.0000 [0.0000;  7.8652] ≤65-years
## Jeon 2014        0.3521 [0.0966;  1.2746] ≤65-years
## Jiang 2013       0.0000 [0.0000;  7.1348] ≤65-years
## Loumiotis 2011   0.0000 [0.0000;  2.3446] ≤65-years
## Matsubara 2004   0.0000 [0.0000;  2.3736] ≤65-years
## Matsumoto 2013   2.6549 [0.9069;  7.5160] ≤65-years
## Mizoi 1995       0.0000 [0.0000; 15.4639] ≤65-years
## Morita 2012      1.8658 [1.4582;  2.3845] ≤65-years
## Murayama 2016    2.2384 [1.6660;  3.0014] >65-years
## Oh 2013          0.0000 [0.0000; 16.8179] >65-years
## Serrone 2016     0.5155 [0.0911;  2.8615] ≤65-years
## So 2010          1.1450 [0.3902;  3.3118] ≤65-years
## Sonobe 2010      1.5625 [0.7589;  3.1897] ≤65-years
## Teo 2016         2.3810 [0.8130;  6.7666] ≤65-years
## Tsutsumi 2000    4.3478 [1.4896; 12.0212] >65-years
## Villablanca 2013 1.5544 [0.5300;  4.4697] ≤65-years
## Wiebers-R 1998   1.3503 [0.8558;  2.1244] ≤65-years
## Wiebers-P 2003   1.0204 [0.6193;  1.6768] ≤65-years
## Wilkinson 2018   0.0000 [0.0000; 14.8655] ≤65-years
## Zylkowski 2015   2.7273 [0.9318;  7.7131] ≤65-years
## 
## Number of studies combined: k = 28
## 
##                      events           95%-CI
## Fixed effect model   1.4774 [1.2828; 1.7011]
## Random effects model 1.2256 [0.9333; 1.6081]
## 
## Quantifying heterogeneity:
##  tau^2 = 0.1164; tau = 0.3411; I^2 = 40.5%; H = 1.30
## 
## Quantifying residual heterogeneity:
##  I^2 = 0.0% [0.0%; 29.7%]; H = 1.00 [1.00; 1.19]
## 
## Test of heterogeneity:
##      Q d.f. p-value             Test
##  28.05   27  0.4081        Wald-type
##  48.70   27  0.0064 Likelihood-Ratio
## 
## Results for subgroups (fixed effect model):
##                   k events           95%-CI     Q   I^2
## Age = ≤65-years  25 1.3271 [1.1281; 1.5605] 19.98 27.4%
## Age = >65-years   3 2.2897 [1.7193; 3.0435]  1.26  0.0%
## 
## Test for subgroup differences (fixed effect model):
##                    Q d.f. p-value
## Between groups 10.53    1  0.0012
## Within groups  21.24   26  0.7295
## 
## Results for subgroups (random effects model):
##                   k events           95%-CI  tau^2    tau
## Age = ≤65-years  25 1.1375 [0.8500; 1.5207] 0.0793 0.2817
## Age = >65-years   3 2.2897 [1.2184; 4.2623]      0      0
## 
## Test for subgroup differences (random effects model):
##                      Q d.f. p-value
## Between groups   11.89    1  0.0006
## 
## Details on meta-analytical method:
## - Random intercept logistic regression model
## - Maximum-likelihood estimator for tau^2
## - Hartung-Knapp adjustment for random effects model
## - Logit transformation
## - Wilson Score confidence interval for individual studies
## - Continuity correction of 0.5 in studies with zero cell frequencies
##   (only used to calculate individual study results)
## - Events per 100 observations
dat10 %>%
  drop_na(age_cat) %>%
  summarise(mean = mean(fu, na.rm = TRUE) / 12)
## # A tibble: 1 x 1
##    mean
##   <dbl>
## 1  3.72
dat10 %>%
  drop_na(age_cat) %>%
  filter(age <65 | age == 65) %>%
  summarise(mean = mean(fu, na.rm = TRUE) / 12)
## # A tibble: 1 x 1
##    mean
##   <dbl>
## 1  3.77
dat10 %>%
  drop_na(age_cat) %>%
  filter(age >65 ) %>%
  summarise(mean = mean(fu, na.rm = TRUE) / 12)
## # A tibble: 1 x 1
##    mean
##   <dbl>
## 1  3.36
forest(pes.summary.glmm.age,
       layout = "meta",
       comb.fixed = FALSE,
       comb.random = TRUE,
       print.I2.ci = TRUE,
       colgap.left = "10mm",
       colgap.forest.left = "10mm",
       colgap.forest.right = "0mm",
       leftlabs = c("Study", "Ruptures", "Total"),
       rightcols = c("effect", "ci"),
       rightlabs = c("Ruptures per 100", "95% CI"),
       smlab = " ",
       xlim=c(0,10),
       xlab = "Rupture Proportion per 100",
       pooled.events = TRUE,
       test.subgroup.random = TRUE,
       label.test.effect.subgroup.random = TRUE,
       JAMA.pval = TRUE,
       text.addline1 = "Age ≤65 years mean follow up: 3.8 years",
       text.addline2 = "Age >65 years mean follow up: 3.4 years",
       ) 

The test for subgroup differences (random effects model) indicates that there is a statistically significant subgroup effect (p = 0.0006, suggesting that age > 65 years was associated with a higher the rupture risk compared to mean study age <65 years across the included trials.

However, the overall covariate distribution is concerning with few rupture events and SUIAs in the > 65 year old cohort. There may be additional unkown confounders also influencing the results of the subgroup analysis, and further reducing the ability to provide additional useful findings. Moreover, this does not apply to the individual patient, since differences in risk estimates may be confounded with unmeasured or unreported factors.

Sub-group analysis based on study population - Japanese vs non-Japanese

dat.pop <- dat %>%
  slice(-12)

pes.summary.glmm.pop = metaprop(rupt, total_size,
                            data=dat.pop,
                            studlab=paste(auth_year),
                            byvar = pop,
                            bylab = "Population",
                            method="GLMM",
                            sm="PLOGIT",
                            method.tau = "ML", 
                            method.ci = "WS",
                            pscale = 100,
                            hakn = TRUE,
                            ) 
pes.summary.glmm.pop
##                  events            95%-CI   Population
## Bor 2015         0.7444 [0.2535;  2.1655] Non-Japanese
## Broderick 2009   1.8519 [0.5093;  6.5019] Non-Japanese
## Burns 2009       0.5780 [0.1021;  3.2011] Non-Japanese
## Byoun 2016       1.7628 [0.9871;  3.1288] Non-Japanese
## Choi 2018        0.5780 [0.1021;  3.2011] Non-Japanese
## Gibbs 2004       0.0000 [0.0000; 14.8655] Non-Japanese
## Gondar 2016      0.8152 [0.2776;  2.3691] Non-Japanese
## Guresir 2013     0.7812 [0.2660;  2.2715] Non-Japanese
## Irazabal 2011    0.0000 [0.0000;  7.8652] Non-Japanese
## Jeon 2014        0.3521 [0.0966;  1.2746] Non-Japanese
## Jiang 2013       0.0000 [0.0000;  7.1348] Non-Japanese
## Loumiotis 2011   0.0000 [0.0000;  2.3446] Non-Japanese
## Matsubara 2004   0.0000 [0.0000;  2.3736]     Japanese
## Matsumoto 2013   2.6549 [0.9069;  7.5160]     Japanese
## Mizoi 1995       0.0000 [0.0000; 15.4639]     Japanese
## Morita 2012      1.8658 [1.4582;  2.3845]     Japanese
## Murayama 2016    2.2384 [1.6660;  3.0014]     Japanese
## Oh 2013          0.0000 [0.0000; 16.8179] Non-Japanese
## Serrone 2016     0.5155 [0.0911;  2.8615] Non-Japanese
## So 2010          1.1450 [0.3902;  3.3118] Non-Japanese
## Sonobe 2010      1.5625 [0.7589;  3.1897]     Japanese
## Teo 2016         2.3810 [0.8130;  6.7666] Non-Japanese
## Tsukahara 2005   3.4722 [1.4921;  7.8703] Non-Japanese
## Tsutsumi 2000    4.3478 [1.4896; 12.0212]     Japanese
## Villablanca 2013 1.5544 [0.5300;  4.4697] Non-Japanese
## Wiebers-R 1998   1.3503 [0.8558;  2.1244] Non-Japanese
## Wiebers-P 2003   1.0204 [0.6193;  1.6768] Non-Japanese
## Wilkinson 2018   0.0000 [0.0000; 14.8655] Non-Japanese
## Zylkowski 2015   2.7273 [0.9318;  7.7131] Non-Japanese
## 
## Number of studies combined: k = 29
## 
##                      events           95%-CI
## Fixed effect model   1.4995 [1.3044; 1.7234]
## Random effects model 1.2681 [0.9741; 1.6495]
## 
## Quantifying heterogeneity:
##  tau^2 = 0.1209; tau = 0.3476; I^2 = 41.3%; H = 1.31
## 
## Quantifying residual heterogeneity:
##  I^2 = 0.0% [0.0%; 26.7%]; H = 1.00 [1.00; 1.17]
## 
## Test of heterogeneity:
##      Q d.f. p-value             Test
##  30.68   28  0.3313        Wald-type
##  51.52   28  0.0044 Likelihood-Ratio
## 
## Results for subgroups (fixed effect model):
##                             k events           95%-CI     Q  I^2
## Population = Non-Japanese  22 1.1078 [0.8869; 1.3829] 18.04 2.0%
## Population = Japanese       7 1.9494 [1.6300; 2.3300]  3.35 0.0%
## 
## Test for subgroup differences (fixed effect model):
##                    Q d.f. p-value
## Between groups 15.12    1  0.0001
## Within groups  21.39   27  0.7678
## 
## Results for subgroups (random effects model):
##                             k events           95%-CI  tau^2    tau
## Population = Non-Japanese  22 1.1033 [0.8410; 1.4463] 0.0061 0.0782
## Population = Japanese       7 1.9494 [1.5590; 2.4353]      0      0
## 
## Test for subgroup differences (random effects model):
##                      Q d.f. p-value
## Between groups   12.83    1  0.0003
## 
## Details on meta-analytical method:
## - Random intercept logistic regression model
## - Maximum-likelihood estimator for tau^2
## - Hartung-Knapp adjustment for random effects model
## - Logit transformation
## - Wilson Score confidence interval for individual studies
## - Continuity correction of 0.5 in studies with zero cell frequencies
##   (only used to calculate individual study results)
## - Events per 100 observations
dat.pop %>%
  drop_na(fu) %>%
  filter(pop == "Japanese") %>%
  summarise(mean = mean(fu, na.rm = TRUE) / 12)
## # A tibble: 1 x 1
##    mean
##   <dbl>
## 1  3.15
dat.pop %>%
  drop_na(fu) %>%
  filter(pop == "Non-Japanese") %>%
  summarise(mean = mean(fu, na.rm = TRUE) / 12)
## # A tibble: 1 x 1
##    mean
##   <dbl>
## 1  3.90
dat.pop %>%
  mutate(age = coalesce(age_mean, age_med)) %>%
  filter(pop == "Japanese") %>%
  drop_na(age) %>%
  summarise(mean = mean(age, na.rm = TRUE)) 
## # A tibble: 1 x 1
##    mean
##   <dbl>
## 1  64.5
dat.pop %>%
  mutate(age = coalesce(age_mean, age_med)) %>%
  filter(pop == "Non-Japanese") %>%
  drop_na(age) %>%
  summarise(mean = mean(age, na.rm = TRUE))
## # A tibble: 1 x 1
##    mean
##   <dbl>
## 1  56.7
dat.age.diff <- dat.pop %>%
  mutate(age = coalesce(age_mean, age_med)) %>%
  drop_na(age) %>%
  select(pop, age)

t.test(age ~ pop, data = dat.age.diff)
## 
##  Welch Two Sample t-test
## 
## data:  age by pop
## t = -4.2288, df = 18.381, p-value = 0.0004846
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.718563  -3.947151
## sample estimates:
## mean in group Non-Japanese     mean in group Japanese 
##                   56.66714                   64.50000
forest(pes.summary.glmm.pop,
       layout = "meta",
       comb.fixed = FALSE,
       comb.random = TRUE,
       print.I2.ci = TRUE,
       colgap.left = "10mm",
       colgap.forest.left = "10mm",
       colgap.forest.right = "0mm",
       leftlabs = c("Study", "Ruptures", "Total"),
       rightcols = c("effect", "ci"),
       rightlabs = c("Ruptures per 100", "95% CI"),
       smlab = " ",
       xlim=c(0,10),
       xlab = "Rupture Proportion per 100",
       pooled.events = TRUE,
       test.subgroup.random = TRUE,
       label.test.effect.subgroup.random = TRUE,
       JAMA.pval = TRUE,
       text.addline1 = "Mean Japanese study-level follow up: 3.2 years",
       text.addline2 = "Mean Non-Japanese study-level follow up: 3.9 years",
       ) 

The test for subgroup differences (random effects model) indicates that there is a statistically significant subgroup effect (p < 0.0001), suggesting that a Japanese source population was associated with a higher the rupture risk across the included trials.

This covariate also explains some of the heterogeniety, with only minimal heterogeneity remaining I^2 = 0.0% [0.0%; 26.7%]. The overall covariate distribution is not concerning with the total number of studies and participants with aneurysms in each sub-group.

This result is interesting from a clinical perspective, since the sub-group rupture risk in the Japanese population was higher 1.9494 [1.6300; 2.3300] over a shorter follow up period of 3.2 years, compared to the sub-group rupture risk in non-Japanese of1.0577 [0.8280; 1.3503] over a mean study follow up period of 3.9 years.

Notably, this may be attributable to the older mean age in the Japanese samples with mean age of 64.5 from Japan compared to 56.7 years, p-value = 0.0004846

Note that there may be additional unkown confounders also influencing the results of the subgroup analysis, and further reducing the ability to provide additional useful findings. Moreover, this does not apply to the individual patient, since differences in risk estimates may be confounded with unmeasured or unreported factors such as prevalence of hypertention and smoking.

Meta-regression based on study type - retrospective compared to prospective

dat.type <- dat %>%
  slice(-12)

pes.summary.glmm.type = metaprop(rupt, total_size,
                            data=dat.type,
                            studlab=paste(auth_year),
                            byvar = type,
                            bylab = "Study Type",
                            method="GLMM",
                            sm="PLOGIT",
                            method.tau = "ML", 
                            method.ci = "WS",
                            hakn = TRUE,
                            pscale = 100
                            ) 
pes.summary.glmm.type
##                  events            95%-CI    Study Type
## Bor 2015         0.7444 [0.2535;  2.1655]   prospective
## Broderick 2009   1.8519 [0.5093;  6.5019]   prospective
## Burns 2009       0.5780 [0.1021;  3.2011] retrospective
## Byoun 2016       1.7628 [0.9871;  3.1288] retrospective
## Choi 2018        0.5780 [0.1021;  3.2011] retrospective
## Gibbs 2004       0.0000 [0.0000; 14.8655] retrospective
## Gondar 2016      0.8152 [0.2776;  2.3691]   prospective
## Guresir 2013     0.7812 [0.2660;  2.2715] retrospective
## Irazabal 2011    0.0000 [0.0000;  7.8652] retrospective
## Jeon 2014        0.3521 [0.0966;  1.2746] retrospective
## Jiang 2013       0.0000 [0.0000;  7.1348] retrospective
## Loumiotis 2011   0.0000 [0.0000;  2.3446]   prospective
## Matsubara 2004   0.0000 [0.0000;  2.3736] retrospective
## Matsumoto 2013   2.6549 [0.9069;  7.5160] retrospective
## Mizoi 1995       0.0000 [0.0000; 15.4639] retrospective
## Morita 2012      1.8658 [1.4582;  2.3845]   prospective
## Murayama 2016    2.2384 [1.6660;  3.0014]   prospective
## Oh 2013          0.0000 [0.0000; 16.8179] retrospective
## Serrone 2016     0.5155 [0.0911;  2.8615] retrospective
## So 2010          1.1450 [0.3902;  3.3118] retrospective
## Sonobe 2010      1.5625 [0.7589;  3.1897]   prospective
## Teo 2016         2.3810 [0.8130;  6.7666] retrospective
## Tsukahara 2005   3.4722 [1.4921;  7.8703] retrospective
## Tsutsumi 2000    4.3478 [1.4896; 12.0212] retrospective
## Villablanca 2013 1.5544 [0.5300;  4.4697] retrospective
## Wiebers-R 1998   1.3503 [0.8558;  2.1244] retrospective
## Wiebers-P 2003   1.0204 [0.6193;  1.6768]   prospective
## Wilkinson 2018   0.0000 [0.0000; 14.8655] retrospective
## Zylkowski 2015   2.7273 [0.9318;  7.7131] retrospective
## 
## Number of studies combined: k = 29
## 
##                      events           95%-CI
## Fixed effect model   1.4995 [1.3044; 1.7234]
## Random effects model 1.2681 [0.9741; 1.6495]
## 
## Quantifying heterogeneity:
##  tau^2 = 0.1209; tau = 0.3476; I^2 = 41.3%; H = 1.31
## 
## Quantifying residual heterogeneity:
##  I^2 = 10.7% [0.0%; 42.9%]; H = 1.06 [1.00; 1.32]
## 
## Test of heterogeneity:
##      Q d.f. p-value             Test
##  30.68   28  0.3313        Wald-type
##  51.52   28  0.0044 Likelihood-Ratio
## 
## Results for subgroups (fixed effect model):
##                              k events           95%-CI     Q   I^2
## Study Type = prospective     8 1.6461 [1.3922; 1.9454] 11.45 54.9%
## Study Type = retrospective  21 1.2492 [0.9711; 1.6056] 18.79 25.7%
## 
## Test for subgroup differences (fixed effect model):
##                    Q d.f. p-value
## Between groups  3.21    1  0.0732
## Within groups  30.24   27  0.3035
## 
## Results for subgroups (random effects model):
##                              k events           95%-CI  tau^2    tau
## Study Type = prospective     8 1.3859 [0.8845; 2.1652] 0.0951 0.3083
## Study Type = retrospective  21 1.1920 [0.8172; 1.7356] 0.1282 0.3580
## 
## Test for subgroup differences (random effects model):
##                     Q d.f. p-value
## Between groups   0.33    1  0.5647
## 
## Details on meta-analytical method:
## - Random intercept logistic regression model
## - Maximum-likelihood estimator for tau^2
## - Hartung-Knapp adjustment for random effects model
## - Logit transformation
## - Wilson Score confidence interval for individual studies
## - Continuity correction of 0.5 in studies with zero cell frequencies
##   (only used to calculate individual study results)
## - Events per 100 observations
dat %>%
  drop_na(fu) %>%
  filter(type == "prospective") %>%
  summarise(mean = mean(fu, na.rm = TRUE) / 12)
## # A tibble: 1 x 1
##    mean
##   <dbl>
## 1  5.17
dat %>%
  drop_na(fu) %>%
  filter(type == "retrospective") %>%
  summarise(mean = mean(fu, na.rm = TRUE) / 12)
## # A tibble: 1 x 1
##    mean
##   <dbl>
## 1  4.03
forest(pes.summary.glmm.type,
       layout = "meta",
       comb.fixed = FALSE,
       comb.random = TRUE,
       print.I2.ci = TRUE,
       colgap.left = "10mm",
       colgap.forest.left = "10mm",
       colgap.forest.right = "0mm",
       leftlabs = c("Study", "Ruptures", "Total"),
       rightcols = c("effect", "ci"),
       rightlabs = c("Ruptures per 100", "95% CI"),
       smlab = " ",
       xlim=c(0,10),
       xlab = "Rupture Proportion per 100",
       pooled.events = TRUE,
       test.subgroup.random = TRUE,
       label.test.effect.subgroup.random = TRUE,
       text.addline1 = "Mean prospective study-level follow up: 5.2 years",
       text.addline2 = "Mean retrospective study-level follow up: 4.0 years",
       JAMA.pval = TRUE,
       ) 

The test for subgroup differences (random effects model) indicates that there is no statistically significant subgroup effect ( p= 0.5647), suggesting that there was no difference in rupture risk between prospective and retrospective studies. In prospective studies, the point estimate was higher 1.3859 [0.8845; 2.1652] with wider confidence intervals over a longer time period of 5.2 years compared to retrospective studies with point estimate of rupture risk of 1.1920 [0.8172; 1.7356] over 4.0 years. Although retrospective studies may be at greater tisk of incomplete case follow up due to the case-fatality rate prior to hospital presentation, this exploratory meta-regresion shows no association

This covariate does not explain the residual heterogeniety, with moderate heterogeneity remaining 10.7% [0.0%; 42.9%]. There is also some concern regarding the overall covariate distribution given majority of the rupture events are in the prospective cohort, with most rupture events occuring in 2 studies with a Japanese source population. In addition, there may be additional unknown confounders also influencing the results of the subgroup analysis, and further reducing the ability to provide additional useful findings. Moreover, this does not apply to the individual patient, but rather is an exploration of the sources of heterogeneity due to methodological differences.

Metaregression to explore study length of follow up on rupture outcomes, including all studies

dat.fu.metareg <- dat %>%
  slice(-12) %>%
  drop_na(fu) %>%
  mutate(fu_year = fu/12) %>%
  select(auth_year, rupt, total_size, fu_year)
fu.metareg = metaprop(rupt, total_size,
                      data=dat.fu.metareg,
                      studlab=paste(auth_year),
                      method="GLMM",
                      sm="PLOGIT",
                      method.tau = "ML", 
                      method.ci = "WS",
                      hakn = TRUE,
                      pscale = 100
                      )

fu.metareg.result <- metareg(fu.metareg, ~fu_year, hakn = TRUE)

fu.metareg.result
## 
## Mixed-Effects Model (k = 26; tau^2 estimator: ML)
## 
## tau^2 (estimated amount of residual heterogeneity):     0.1301
## tau (square root of estimated tau^2 value):             0.3606
## I^2 (residual heterogeneity / unaccounted variability): 40.4409%
## H^2 (unaccounted variability / sampling variability):   1.6790
## 
## Tests for Residual Heterogeneity:
## Wld(df = 24) = 26.0721, p-val = 0.3495
## LRT(df = 24) = 46.6633, p-val = 0.0037
## 
## Test of Moderators (coefficient 2):
## F(df1 = 1, df2 = 24) = 0.0192, p-val = 0.8910
## 
## Model Results:
## 
##          estimate      se      tval    pval    ci.lb    ci.ub 
## intrcpt   -4.3957  0.2955  -14.8759  <.0001  -5.0056  -3.7859  *** 
## fu_year   -0.0096  0.0692   -0.1385  0.8910  -0.1524   0.1332      
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
range(dat.fu.metareg$fu_year) 
## [1] 1.391667 8.300000
mean(dat.fu.metareg$fu_year)
## [1] 3.724776
sum(dat.fu.metareg$rupt)
## [1] 185
sum(dat.fu.metareg$total_size)
## [1] 12639
bubble(fu.metareg.result, 
       xlim=c(0, 10), 
       xlab = "Mean study follow up (years)",
       ylab = "Rupture risk estimate logit tranformed proportion",
       lty = 3,
       lwd = 2,
       studlab=dat.fu.metareg$auth_year, 
       cex.studlab = 0.75
       )

Interpretation:

The random effects metaregression included participants from 26 studies with a mean study level follow up time of 3.7 years (range 1.4 - 8.3 years). There were a total of 185 ruptures and 12 639 aneurysms in studies included in this metaregression analysis. The metaregression result was not significant F(df1 = 1, df2 = 24) = 0.0192, p-val = 0.8910, suggesting that the proportion of mean study level follow up time was not associated with the risk of rupture at the study level. However, there is moderate unexplained residual heterogeneity (I2 = 40.4%), which similar to all included studies, indicating that this meta-regression analysis was not informative in identifying sources of heterogeneity in the meta-analysis result. In addition, this random effects metaregression analysis is susceptible to confounding bias from other study level characteristics, and is not generalisable to an individual patient, since there is aggregation bias due to study-level data being used for analysis.

Multivariable meta-regression

In the previous scenarios we have considered the impact of a single study-level covariate on the outcome of rupture risk. Multivariable meta-regression was not performed due to risk for introducing false positive results.

Sensitivity analysis

Sensitivity analysis is used to assess the robustness of the result, and helps strengthen or weaken the conclusions that can be drawn from the meta-analysis. Robustness of the result is the sensitivity of the overall result to various limitations of the data, assumptions, and statistical approaches. This helps answer the question - what if key inputs or assumptions were changed ?

The goal of sensitivity analyses are to repeat the analyses by substituting alternative decisions. When sensitivity analyses show that the meta-analysis results are not altered by alternative decisions made during the systematic review or data synthesis, the results are considered more robust. Similarly, if sensitivity analyses show that the meta-analysis results are affected, then the overall results should be interpreted with caution, and could instead be considered to be hypothesis generating for additional research.

The difference between sensitivity analysis and subgroup analysis is that there is no assessment of the pooled effect in the removed studies, and there is no formal statistical procedure to compare the included or removed studies. Instead, there is an informal comparison by recalculating the pooled effect size.

Sensitivity analysis to consider are determined by the meta-analyst. General aptions to consider include including or excluding additional studies based on search strategy, inclusion criteria, study level charachteristics (clinical or methodological), handling of outliers, handling of missing data, statistical procedures, and / or outcome measures.

The following sensitivity analysis will be perfomed.

  1. Sensitivity analysis using leave-1-out method of the studies ordered by impact on the pooled rupture risk estimate with residual I2 indicated.

  2. Sensitivity analysis using ≤7mm SIUIA size threshold.

  3. Sensitivity analysis using ≤5mm SIUIA size threshold.

  4. Sensitivity analysis using ≤3mm SIUIA size threshold.

  5. Sensitivity analysis including outlier study.

  6. Sensitivity analysis limited to good standard studies.

Sensitivity analysis - consider leave-1-out on effect size

inf.analysis <- InfluenceAnalysis(x = pes.summary.glmm, 
                                  random = TRUE,
                                  subplot.heights = c(30,18),
                                  subplot.widths = c(30,30),
                                  forest.lims = c(0,0.03))
## [===========================================================================] DONE
plot(inf.analysis, "es")

plot(inf.analysis, "i2")

Here there is no clinically relevant change in the pooled rupture risk estimate - the 95% CIs remain within 1-2%.

Sensitivity analysis - consider decision to utilise 10 mm and less or 7 mm and less for aneurysm size categories

sizedata7 <- filter(maindata, size == 7) 
dat7 <- sizedata7 %>%
  mutate(prop_multi = multi_tot / num_tot,
         num_multi = prop_multi * num + num,
         num_multi_temp = coalesce(num_anr, num_multi),
         total_size_temp = coalesce(num_anr, num),
         total_size_temp_2 = coalesce(num_multi_temp, total_size_temp),
         total_size = round(total_size_temp_2, 0),
         psah_size_temp = psah * prop_multi + psah,
         prop_psah = psah_tot / num_tot,
         num_anr_psah = prop_psah * total_size,
         size_psah_temp = coalesce(psah_size_temp, num_anr_psah),
         psah_size = round(size_psah_temp, 0),
         ) %>%
  mutate(fu = coalesce(fu_mean_tot,fu_med_tot)) %>% 
  unite(auth_year, c(auth, pub), sep = " ", remove = FALSE) %>%
  mutate(pop = fct_collapse(sizedata7$country,
                            "Japanese" = "Japan", 
                            "Non-Japanese" = c("International", "United States", "Switzerland", "Australia", "Korea", "Poland", "China", "Germany", "United Kingdom", "Finland"))
         )

Meta-analysis of proportions of 7 mm aneurysms and less using GLMM and Wilson CIs

dat7 <- dat7 %>%
  slice(-12) %>%
  drop_na(total_size) %>%
  drop_na(rupt)

pes.summary.glmm7 = metaprop(rupt, total_size,
                            data=dat7,
                            studlab=paste(auth_year),
                            method="GLMM",
                            sm="PLOGIT",
                            method.tau = "ML", 
                            method.ci = "WS",
                            hakn = TRUE,
                            pscale = 100
                            ) 
pes.summary.glmm7
##                  events            95%-CI
## Bor 2015         0.4963 [0.1362;  1.7912]
## Broderick 2009   1.8519 [0.5093;  6.5019]
## Byoun 2016       1.8613 [1.0424;  3.3018]
## Choi 2018        0.5780 [0.1021;  3.2011]
## Gibbs 2004       0.0000 [0.0000; 14.8655]
## Gondar 2016      0.8152 [0.2776;  2.3691]
## Guresir 2013     0.7812 [0.2660;  2.2715]
## Irazabal 2011    0.0000 [0.0000;  8.2010]
## Jeon 2014        0.3521 [0.0966;  1.2746]
## Jiang 2013       0.0000 [0.0000;  8.7622]
## Matsubara 2004   0.0000 [0.0000;  2.9815]
## Matsumoto 2013   0.8850 [0.1564;  4.8431]
## Mizoi 1995       0.0000 [0.0000; 15.4639]
## Morita 2012      1.2933 [0.9397;  1.7774]
## Murayama 2016    2.0679 [1.5164;  2.8142]
## Oh 2013          0.0000 [0.0000; 21.5311]
## Serrone 2016     0.0000 [0.0000;  1.9417]
## So 2010          0.4695 [0.0829;  2.6110]
## Sonobe 2010      1.5625 [0.7589;  3.1897]
## Teo 2016         2.3810 [0.8130;  6.7666]
## Tsukahara 2005   1.8692 [0.5141;  6.5604]
## Tsutsumi 2000    3.3333 [0.9189; 11.3638]
## Villablanca 2013 1.5544 [0.5300;  4.4697]
## Wiebers-P 2003   0.6122 [0.3224;  1.1595]
## Wilkinson 2018   0.0000 [0.0000; 16.1125]
## Zylkowski 2015   2.7273 [0.9318;  7.7131]
## 
## Number of studies combined: k = 26
## 
##                      events           95%-CI
## Fixed effect model   1.2288 [1.0363; 1.4565]
## Random effects model 1.0424 [0.7598; 1.4285]
## 
## Quantifying heterogeneity:
##  tau^2 = 0.1674; tau = 0.4092; I^2 = 42.3%; H = 1.32
## 
## Test of heterogeneity:
##      Q d.f. p-value             Test
##  27.57   25  0.3280        Wald-type
##  44.40   25  0.0098 Likelihood-Ratio
## 
## Details on meta-analytical method:
## - Random intercept logistic regression model
## - Maximum-likelihood estimator for tau^2
## - Hartung-Knapp adjustment for random effects model
## - Logit transformation
## - Wilson Score confidence interval for individual studies
## - Continuity correction of 0.5 in studies with zero cell frequencies
##   (only used to calculate individual study results)
## - Events per 100 observations
dat7 %>%
  drop_na(fu) %>%
  summarise(mean = mean(fu, na.rm = TRUE) / 12)
## # A tibble: 1 x 1
##    mean
##   <dbl>
## 1  3.62

Publication quality forest plots for 7mm and less

forest(pes.summary.glmm7,
       layout = "meta",
       comb.fixed = FALSE,
       comb.random = TRUE,
       print.I2.ci = TRUE,
       colgap = "7mm",
       leftlabs = c("Study", "Ruptures", "Total"),
       rightcols = c("effect", "ci"),
       rightlabs = c("Ruptures per 100", "95% CI"),
       smlab = " ",
       xlim=c(0,10),
       xlab = "Rupture per 100 aneurysms",
       pooled.events = TRUE,
       text.addline1 = "Mean study-level follow up: 3.6 years",
       JAMA.pval = TRUE,
       ) 

Main result summary estimate is 1.2681 [0.9741; 1.6495] with I2 of 41.3% over study level mean of 3.7 years.

Sensitivity analysis performed with aneurysms measuring 7 mm and less have a pooled rupture risk estimate of is 1.0424 [0.7598; 1.4285] with I2 of 42.3% over study level mean of 3.6 years

This sensitivity analysis does not alter the overall impact from a clinical perspective, since the pooled rupture risk remains within 1-2%.

Sensitivity analysis - consider decision to utilise 10 mm and less or 5 mm and less for aneurysm size categories

dat5 <- dat5 %>%
  slice(-12) %>%
  drop_na(total_size) %>%
  drop_na(rupt)

pes.summary.glmm5 = metaprop(rupt, total_size,
                            data=dat5,
                            studlab=paste(auth_year),
                            method="GLMM",
                            sm="PLOGIT",
                            method.tau = "ML", 
                            method.ci = "WS",
                            hakn = TRUE,
                            pscale = 100
                            ) 
pes.summary.glmm5
##                events            95%-CI
## Bor 2015       0.7435 [0.2041;  2.6699]
## Broderick 2009 1.3889 [0.2456;  7.4566]
## Gibbs 2004     0.0000 [0.0000; 16.8179]
## Gondar 2016    0.9091 [0.3096;  2.6383]
## Guresir 2013   1.0563 [0.3599;  3.0592]
## Irazabal 2011  0.0000 [0.0000; 10.1515]
## Jeon 2014      0.3521 [0.0966;  1.2746]
## Jiang 2013     0.0000 [0.0000; 16.8179]
## Matsubara 2004 0.0000 [0.0000;  2.9815]
## Matsumoto 2013 1.2658 [0.2238;  6.8276]
## Mizoi 1995     0.0000 [0.0000; 15.4639]
## Morita 2012    1.1500 [0.7675;  1.7198]
## Murayama 2016  1.2813 [0.8477;  1.9325]
## Oh 2013        0.0000 [0.0000; 22.8095]
## Serrone 2016   0.0000 [0.0000;  7.4100]
## So 2010        0.4695 [0.0829;  2.6110]
## Sonobe 2010    1.5625 [0.7589;  3.1897]
## Tsukahara 2005 1.8692 [0.5141;  6.5604]
## Tsutsumi 2000  3.3333 [0.9189; 11.3638]
## Wilkinson 2018 0.0000 [0.0000; 16.8179]
## Zylkowski 2015 2.1739 [0.5982;  7.5835]
## 
## Number of studies combined: k = 21
## 
##                      events           95%-CI
## Fixed effect model   1.0861 [0.8616; 1.3684]
## Random effects model 1.0861 [0.8616; 1.3684]
## 
## Quantifying heterogeneity:
##  tau^2 = 0; tau = 0; I^2 = 0.0%; H = 1.00
## 
## Test of heterogeneity:
##      Q d.f. p-value             Test
##   8.53   20  0.9877        Wald-type
##  16.43   20  0.6893 Likelihood-Ratio
## 
## Details on meta-analytical method:
## - Random intercept logistic regression model
## - Maximum-likelihood estimator for tau^2
## - Hartung-Knapp adjustment for random effects model
## - Logit transformation
## - Wilson Score confidence interval for individual studies
## - Continuity correction of 0.5 in studies with zero cell frequencies
##   (only used to calculate individual study results)
## - Events per 100 observations
dat5 %>%
  drop_na(fu) %>%
  summarise(mean = mean(fu, na.rm = TRUE) / 12)
## # A tibble: 1 x 1
##    mean
##   <dbl>
## 1  3.66

Publication quality forest plots for 5mm and less

forest(pes.summary.glmm5,
       layout = "meta",
       comb.fixed = FALSE,
       comb.random = TRUE,
       print.I2.ci = TRUE,
       colgap = "7mm",
       leftlabs = c("Study", "Ruptures", "Total"),
       rightcols = c("effect", "ci"),
       rightlabs = c("Ruptures per 100", "95% CI"),
       smlab = " ",
       xlim=c(0,10),
       xlab = "Rupture per 100 aneurysms",
       pooled.events = TRUE,
       text.addline1 = "Mean study-level follow up: 3.7-years",
       JAMA.pval = TRUE,
       ) 

Main result summary estimate is 1.2681 [0.9741; 1.6495] with I2 of 41.3% over study level mean of 3.7 years.

Sensitivity analysis performed with aneurysms measuring 5 mm and less have a pooled rupture risk estimate of 1.0580 [0.7982; 1.4011] over 3.7 years, with I^2 = 0%. This sensitivity analysis does not alter the overall impact from a clinical perspective, since the pooled rupture risk remains within 1-2%.

Sensitivity analysis for ≤3mm aneurysms

sizedata3 <- filter(maindata, size == 3) 
dat3 <- sizedata3 %>%
  mutate(prop_multi = multi_tot / num_tot,
         num_multi = prop_multi * num + num,
         num_multi_temp = coalesce(num_anr, num_multi),
         total_size_temp = coalesce(num_anr, num),
         total_size_temp_2 = coalesce(num_multi_temp, total_size_temp),
         total_size = round(total_size_temp_2, 0),
         psah_size_temp = psah * prop_multi + psah,
         prop_psah = psah_tot / num_tot,
         num_anr_psah = prop_psah * total_size,
         size_psah_temp = coalesce(psah_size_temp, num_anr_psah),
         psah_size = round(size_psah_temp, 0),
         ) %>%
  mutate(fu = coalesce(fu_mean_tot,fu_med_tot)) %>% 
  unite(auth_year, c(auth, pub), sep = " ", remove = FALSE) %>%
  mutate(pop = fct_collapse(sizedata7$country,
                            "Japanese" = "Japan", 
                            "Non-Japanese" = c("International", "United States", "Switzerland", "Australia", "Korea", "Poland", "China", "Germany", "United Kingdom", "Finland"))
         )
dat3 <- dat3 %>%
  slice(-12) %>%
  drop_na(total_size) %>%
  drop_na(rupt)

pes.summary.glmm3 = metaprop(rupt, total_size,
                            data=dat3,
                            studlab=paste(auth_year),
                            method="GLMM",
                            sm="PLOGIT",
                            method.tau = "ML", 
                            method.ci = "WS",
                            hakn = TRUE,
                            pscale = 100
                            ) 
pes.summary.glmm3
##                events            95%-CI
## Bor 2015       3.0303 [0.5369; 15.3187]
## Broderick 2009 1.3889 [0.2456;  7.4566]
## Gibbs 2004     0.0000 [0.0000; 24.2494]
## Gondar 2016    0.5376 [0.0950;  2.9821]
## Guresir 2013   1.6000 [0.4399;  5.6463]
## Irazabal 2011  0.0000 [0.0000; 15.4639]
## Jiang 2013     0.0000 [0.0000; 16.8179]
## Matsumoto 2013 0.0000 [0.0000;  4.6371]
## Mizoi 1995     0.0000 [0.0000; 15.4639]
## Oh 2013        0.0000 [0.0000; 56.1497]
## Serrone 2016   0.0000 [0.0000;  7.4100]
## So 2010        1.6393 [0.2900;  8.7189]
## Sonobe 2010    1.6129 [0.4434;  5.6903]
## Wilkinson 2018 0.0000 [0.0000; 17.5879]
## Zylkowski 2015 0.0000 [0.0000; 11.6970]
## 
## Number of studies combined: k = 15
## 
##                      events           95%-CI
## Fixed effect model   0.9401 [0.4708; 1.8683]
## Random effects model 0.9401 [0.4708; 1.8683]
## 
## Quantifying heterogeneity:
##  tau^2 = 0; tau = 0; I^2 = 0.0%; H = 1.00
## 
## Test of heterogeneity:
##     Q d.f. p-value             Test
##  1.60   14  1.0000        Wald-type
##  7.46   14  0.9156 Likelihood-Ratio
## 
## Details on meta-analytical method:
## - Random intercept logistic regression model
## - Maximum-likelihood estimator for tau^2
## - Hartung-Knapp adjustment for random effects model
## - Logit transformation
## - Wilson Score confidence interval for individual studies
## - Continuity correction of 0.5 in studies with zero cell frequencies
##   (only used to calculate individual study results)
## - Events per 100 observations
dat3 %>%
  drop_na(fu) %>%
  summarise(mean = mean(fu, na.rm = TRUE) / 12)
## # A tibble: 1 x 1
##    mean
##   <dbl>
## 1  3.97

Publication quality forest plots for 3mm and less

forest(pes.summary.glmm3,
       layout = "meta",
       comb.fixed = FALSE,
       comb.random = TRUE,
       print.I2.ci = TRUE,
       colgap = "7mm",
       leftlabs = c("Study", "Ruptures", "Total"),
       rightcols = c("effect", "ci"),
       rightlabs = c("Ruptures per 100", "95% CI"),
       smlab = " ",
       xlim=c(0,10),
       xlab = "Rupture per 100 aneurysms",
       pooled.events = TRUE,
       text.addline1 = "Mean study-level follow up: 4.0 years",
       JAMA.pval = TRUE,
       ) 

Main result summary estimate is 1.2681 [0.9741; 1.6495] with I2 of 41.3% over study level mean of 3.7 years.

Sensitivity analysis performed with aneurysms measuring 5 mm and less have a pooled rupture risk estimate of 0.9401 [0.4708; 1.8683] over 4.0 years, with I^2 = 0%. This sensitivty analysis is limited due to lack of rupture events and SUIAs. Nonetheless, the sensitivity analysis does not alter the overall impact from a clinical perspective, since the pooled rupture risk remains within 1-2%.

Sensitivity analysis - consider decision to exclude outlier studies

Juvela et al was identified as an outlier, with this study contributing the most to statistical heterogeneity and influencing the result.

We can perform a sensitivity analysis including all studies identified, and informally compare by recalculating the pooled effect size.

This has been done above, and is repeated for completeness below.

pes.summary.glmm.all
##                   events             95%-CI
## Bor 2015          0.7444 [ 0.2535;  2.1655]
## Broderick 2009    1.8519 [ 0.5093;  6.5019]
## Burns 2009        0.5780 [ 0.1021;  3.2011]
## Byoun 2016        1.7628 [ 0.9871;  3.1288]
## Choi 2018         0.5780 [ 0.1021;  3.2011]
## Gibbs 2004        0.0000 [ 0.0000; 14.8655]
## Gondar 2016       0.8152 [ 0.2776;  2.3691]
## Guresir 2013      0.7812 [ 0.2660;  2.2715]
## Irazabal 2011     0.0000 [ 0.0000;  7.8652]
## Jeon 2014         0.3521 [ 0.0966;  1.2746]
## Jiang 2013        0.0000 [ 0.0000;  7.1348]
## Juvela 2013      18.6747 [13.4796; 25.2868]
## Loumiotis 2011    0.0000 [ 0.0000;  2.3446]
## Matsubara 2004    0.0000 [ 0.0000;  2.3736]
## Matsumoto 2013    2.6549 [ 0.9069;  7.5160]
## Mizoi 1995        0.0000 [ 0.0000; 15.4639]
## Morita 2012       1.8658 [ 1.4582;  2.3845]
## Murayama 2016     2.2384 [ 1.6660;  3.0014]
## Oh 2013           0.0000 [ 0.0000; 16.8179]
## Serrone 2016      0.5155 [ 0.0911;  2.8615]
## So 2010           1.1450 [ 0.3902;  3.3118]
## Sonobe 2010       1.5625 [ 0.7589;  3.1897]
## Teo 2016          2.3810 [ 0.8130;  6.7666]
## Tsukahara 2005    3.4722 [ 1.4921;  7.8703]
## Tsutsumi 2000     4.3478 [ 1.4896; 12.0212]
## Villablanca 2013  1.5544 [ 0.5300;  4.4697]
## Wiebers-R 1998    1.3503 [ 0.8558;  2.1244]
## Wiebers-P 2003    1.0204 [ 0.6193;  1.6768]
## Wilkinson 2018    0.0000 [ 0.0000; 14.8655]
## Zylkowski 2015    2.7273 [ 0.9318;  7.7131]
## 
## Number of studies combined: k = 30
## 
##                      events           95%-CI
## Fixed effect model   1.7160 [1.5077; 1.9525]
## Random effects model 1.2378 [0.8149; 1.8760]
## 
## Quantifying heterogeneity:
##  tau^2 = 0.7571; tau = 0.8701; I^2 = 83.2%; H = 2.44
## 
## Test of heterogeneity:
##       Q d.f.  p-value             Test
##  179.23   29 < 0.0001        Wald-type
##  152.15   29 < 0.0001 Likelihood-Ratio
## 
## Details on meta-analytical method:
## - Random intercept logistic regression model
## - Maximum-likelihood estimator for tau^2
## - Hartung-Knapp adjustment for random effects model
## - Logit transformation
## - Wilson Score confidence interval for individual studies
## - Continuity correction of 0.5 in studies with zero cell frequencies
##   (only used to calculate individual study results)
## - Events per 100 observations

Main result summary estimate is 1.2681 [0.9741; 1.6495] with I2 of 41.3% over study level mean of 3.7 years.

The pooled rupture risk with Juvela et al is 1.2378 [0.8149; 1.8760]

The impact of including Juvela et al is to reduce the precision of the summary estimate with wider confidence intervals. The inconsistency in the result increases from I2 of 41% to 83% making the result less generalisable. Regardless from a clinical perspective, this sensitivity analysis does not alter the overall impact, since the pooled rupture risk remains within 1-2%.

Sensitivity analysis that includes risk of bias

Quality assessment was performed using the Newastle Ottowa Scale as recommended by Cochrane; this is also the most commonly utilised for observational studies.

This can be converted to the The Agency for Healthcare Research and Quality within the United States Department of Health and Human Services (AHRQ) standards using the following thresholds.

Good quality: 3 or 4 stars in selection domain AND 1 or 2 stars in comparability domain AND 2 or 3 stars in outcome/exposure domain

Fair quality: 2 stars in selection domain AND 1 or 2 stars in comparability domain AND 2 or 3 stars in outcome/exposure domain

Poor quality: 0 or 1 star in selection domain OR 0 stars in comparability domain OR 0 or 1 stars in outcome/exposure domain

dat.sens.nos <- dat %>%
  slice(-12)%>%
  filter(nos_select == 3 | nos_select == 4 ) %>%
  filter(nos_compare == 1 | nos_compare == 2) %>%
  filter(nos_outcome == 2 | nos_outcome == 3) %>%
  mutate(ahrq = "Good")

dat.sens.nos <- dat.sens.nos %>%
  mutate(total_size = coalesce(num_anr,num)) %>%
  drop_na(total_size) %>% 
  unite(auth_year, c(auth, pub), sep = " ", remove = FALSE) %>%
  select(auth_year, fu, rupt, total_size)
pes.sens.nos = metaprop(rupt, total_size,
                        data=dat.sens.nos,
                        studlab=paste(auth_year),
                        method="GLMM",
                        sm="PLOGIT",
                        method.tau = "ML", 
                        method.ci = "WS",
                        hakn = TRUE,
                        pscale = 100
                        ) 
pes.sens.nos
##                events            95%-CI
## Irazabal 2011  0.0000 [0.0000;  7.8652]
## Matsumoto 2013 2.6549 [0.9069;  7.5160]
## Mizoi 1995     0.0000 [0.0000; 15.4639]
## Morita 2012    1.8658 [1.4582;  2.3845]
## Murayama 2016  2.2384 [1.6660;  3.0014]
## Serrone 2016   0.5155 [0.0911;  2.8615]
## So 2010        1.1450 [0.3902;  3.3118]
## Sonobe 2010    1.5625 [0.7589;  3.1897]
## Wiebers-R 1998 1.6901 [1.0717;  2.6558]
## Wiebers-P 2003 1.4299 [0.8684;  2.3458]
## 
## Number of studies combined: k = 10
## 
##                      events           95%-CI
## Fixed effect model   1.8007 [1.5379; 2.1075]
## Random effects model 1.8007 [1.5379; 2.1075]
## 
## Quantifying heterogeneity:
##  tau^2 = 0; tau = 0; I^2 = 0.0%; H = 1.00
## 
## Test of heterogeneity:
##     Q d.f. p-value             Test
##  5.72    9  0.7674        Wald-type
##  9.17    9  0.4218 Likelihood-Ratio
## 
## Details on meta-analytical method:
## - Random intercept logistic regression model
## - Maximum-likelihood estimator for tau^2
## - Hartung-Knapp adjustment for random effects model
## - Logit transformation
## - Wilson Score confidence interval for individual studies
## - Continuity correction of 0.5 in studies with zero cell frequencies
##   (only used to calculate individual study results)
## - Events per 100 observations
dat.sens.nos %>%
  drop_na(fu) %>%
  summarise(mean = mean(fu, na.rm = TRUE) / 12)
## # A tibble: 1 x 1
##    mean
##   <dbl>
## 1  4.27
forest(pes.sens.nos,
       layout = "meta",
       comb.fixed = FALSE,
       comb.random = TRUE,
       print.I2.ci = TRUE,
       colgap = "7mm",
       leftlabs = c("Study", "Ruptures", "Total"),
       rightcols = c("effect", "ci"),
       rightlabs = c("Ruptures per 100", "95% CI"),
       smlab = " ",
       xlim=c(0,10),
       xlab = "Rupture per 100 aneurysms",
       pooled.events = TRUE,
       text.addline1 = "Mean study-level follow up: 4.3-years",
       JAMA.pval = TRUE,
       ) 

Here we can see the point estimate and the confidence intervals varies from

Main result summary estimate is 1.2681 [0.9741; 1.6495] with I2 of 41.3% over study level mean of 3.7 years.

Good quality studies 1.8007 [1.5379; 2.1075] with I2 of 0% over 4.3 years.

Regardles, there is no clinically relevant impact in the pooled rupture risk estimate.

Non-reporting bias

Non reporting bias leads to bias in the meta-analysis due to missing results. Studies with rupture events are more likely to be reported and published than those with no ruptures. These missing studies are not reported, are not identified and not integrated into the meta-analysis. This leads to non-reporting bias which meaning that the calculated effect size might be higher, and the true effect size lower since studies with lower effects were not reported. In addition, there may be bias in the selection of the reported result where study authors choose a particular result for reporting amongst many potential results.

Thus when assessing for non-reporting bias, conventional assessment is focused on identifying whether small studies with small effect sizes are missing or not. This can be performed using a funnel plot.

funnel(pes.summary.glmm,
       studlab = TRUE,
       cex.studlab = 0.5,
       pos.studlab = 3,
       )

eggers.test(pes.summary.glmm)
##              Intercept ConfidenceInterval      t       p
## Egger's test    -0.619      -1.207--0.031 -2.071 0.04801

There are many possible sources of funnel plot assymetry

  1. Non reporting bias - bias from small studies that are not published, and selective outcome reporting bias within studies
  2. Small study effects from poor methodological quality leading to spuriously elevated effecs in small studies
  3. True heterogeniety, statistical modeling artefacts and chance.

In general, testing for funnel plot assymetry should always be performed in the context of visual assessment, and while there are many potential statistical tests for funnel plots including Egger 1997, Harbord 2006, Peters 2006 or Rücker 2008, even Cochrane suggests that tests for funnel plot asymmetry should be used in only a minority of meta-analyses (Ioannidis 2007b).

Interpretation:

Visual inspection of the funnel plot reveals asymmetry, likely due to non-reporting bias, small study effects, and clincial and methodogical heterogeneity. This is confirmed by Eggers test.

Small study effects

Small study effects can be assessed with additional sensitivity analysis considering the random vs fixed effect model. In the setting of heterogeneity, the random effects estimate is biased towards the results of the smaller studies. We can investigate if small study effects cause a clinically relevant change in the point estimate and CIs.

The easiest way is to review our forest plot and include the fixed effect model estimate.

forest(pes.summary.glmm,
       layout = "meta",
       comb.fixed = TRUE,
       comb.random = TRUE,
       print.I2.ci = TRUE,
       colgap = "7mm",
       leftlabs = c("Study", "Ruptures", "Total"),
       rightcols = c("effect", "ci"),
       rightlabs = c("Ruptures per 100", "95% CI"),
       smlab = " ",
       xlim=c(0,10),
       xlab = "Rupture per 100 aneurysms",
       pooled.events = TRUE,
       text.addline1 = "Mean study-level follow up: 3.6 years",
       JAMA.pval = TRUE,
       ) 

From the main study plot we see that the outcome from fixed vs random assumption.

Fixed effect model 1.50 [1.30; 1.72] Random effects model 1.275 [0.97; 1.65]

While the random effects model estimate is lower, the confidence intervals overlap and there is little clinically relevant difference. Thus this sensitivity analysis shows that small-study effects have little effect on the pooled rupture risk estimate.

Limitations

Non-reporting bias due to inclusion of results from published literature only; However we did include results from multiple electronic databases and the Cochrane Trials register.

Risk of bias from studies included in the systematic review given that grey literature, and non-english language studies were excluded. While in general there are no major differences in meta-analytic estimates when restricted to English-language studies compared with those that include non-English studies (Morrison et al 2012, Dechartres et al 2018)7.2.3.2 ., there is a risk of bias given the significant sub-group result in the Japanese population.

Risk of bias from data synthesis due to non-reporting bias from selective reporting within individual studies, and potential lack of publication of smaller studies without rupture events. Further more single imputation methods for last observation carry forward may have led to down-ward biased rupture risk estimates and under-estimation of the true variability.

The associations derived from these meta-regressions are observational. Since they have not been derived from randmised comparison, they cannot be considered causal, and instead are hypothesis generating. Proportions of patient characteristics have been derived from each study, and utilised as covariates in the meta-regression. This is particularly important when considering categorical factors such as source population. Meta-regression relates to the change in the rupture risk estimate to the change in the proportion of Japanese populations included. This does not apply to the individual patient, and differences in risk estimate may be confounded with unmeasured factors such as prevalence of hypertention and smoking.

Meta-regression may fail to capture within-study treatment variation across a covariate because some of the between-study averages, such as mean age, have little variation. In addition, not all studies reported all relevant covariates. This can introduce additional bias if information related to a relevant interaction is reported, while factors not considered relevant are not reported.

In addition, aggregate study-level data meta-regression may fail to identify clinically relevant associations that may could be identified by analysis of patient-level data. Berlin et al Statist. Med. 2002 However, this is not always the case, and if the same studies are included, study-level analysis and patient-level analysis may produce the same result and is considered more cost-efficient * Olkin et al Biometrics 1998 * In this meta-analysis, thre is moderate heterogeniety in pooled rupture risk, and covariates selected such the study proportion with exposure to prior subarachnoid haemorrhage and aneuerysm size varied across the studies by methodological design. These increase the ability to explore associations by study-level covariates. Moreover, limitations of using aggregate study-level data has been minimised by including a large number of trials, rupture events, and aneurysms. Furthermore, we considered only one potential covariate at a time, and inclued this in a random effects meta-regression analysis recognising that there will be residual heterogeneity. Regardless, to consider additional detailed sub-group analysis to provide more nuanced risk estimates for additional clinical and radiological factors such as smoking status and anatomcal location, will require additional meta-analysis of patient-level data.

The lack of availability of such individual patient data for pooling remains a large barrier for advancing knowledge of rupture risk for the individual patient. Initiatives such as the NINDS common data elements, and perhaps requiring all publically funded datasets to become available in a central repository, in the future will help unlock the ability to perform more personalised management for patients who discover that they harbour an unruptured cerebral aneurysm.

Study conclusions

For every 1000 SUIAs selected for surveillance, the pooled rupture risk is estimated between 10 to 17 over 3.6-years. To further improve the certainty of the rupture risk estimate in sub-groups during follow-up, large individual participant datasets are needed to develop and validate patient-level characteristics that are associated with rupture.

Exploratory analysis - consider distributional assumptions

While we have considered our rationale for a GLMM, we could perform an additional sensitivity analysis using the generic inverse variance DL method.

dat.dl <- dat %>%
  slice(-12) 

pes.summary.glmm.dl = metaprop(rupt, total_size,
                            data=dat.dl,
                            studlab=paste(auth_year),
                            method="Inverse",
                            sm="PFT",
                            method.tau = "ML", 
                            method.ci = "WS",
                            hakn = TRUE,
                            pscale = 100
                            ) 
pes.summary.glmm.dl
##                  events            95%-CI %W(fixed) %W(random)
## Bor 2015         0.7444 [0.2535;  2.1655]       3.1        4.7
## Broderick 2009   1.8519 [0.5093;  6.5019]       0.8        1.7
## Burns 2009       0.5780 [0.1021;  3.2011]       1.3        2.6
## Byoun 2016       1.7628 [0.9871;  3.1288]       4.8        6.0
## Choi 2018        0.5780 [0.1021;  3.2011]       1.3        2.6
## Gibbs 2004       0.0000 [0.0000; 14.8655]       0.2        0.4
## Gondar 2016      0.8152 [0.2776;  2.3691]       2.8        4.4
## Guresir 2013     0.7812 [0.2660;  2.2715]       3.0        4.6
## Irazabal 2011    0.0000 [0.0000;  7.8652]       0.3        0.8
## Jeon 2014        0.3521 [0.0966;  1.2746]       4.4        5.7
## Jiang 2013       0.0000 [0.0000;  7.1348]       0.4        0.9
## Loumiotis 2011   0.0000 [0.0000;  2.3446]       1.2        2.4
## Matsubara 2004   0.0000 [0.0000;  2.3736]       1.2        2.4
## Matsumoto 2013   2.6549 [0.9069;  7.5160]       0.9        1.8
## Mizoi 1995       0.0000 [0.0000; 15.4639]       0.2        0.4
## Morita 2012      1.8658 [1.4582;  2.3845]      25.5       10.4
## Murayama 2016    2.2384 [1.6660;  3.0014]      14.8        9.3
## Oh 2013          0.0000 [0.0000; 16.8179]       0.1        0.4
## Serrone 2016     0.5155 [0.0911;  2.8615]       1.5        2.8
## So 2010          1.1450 [0.3902;  3.3118]       2.0        3.5
## Sonobe 2010      1.5625 [0.7589;  3.1897]       3.4        5.0
## Teo 2016         2.3810 [0.8130;  6.7666]       1.0        2.0
## Tsukahara 2005   3.4722 [1.4921;  7.8703]       1.1        2.2
## Tsutsumi 2000    4.3478 [1.4896; 12.0212]       0.5        1.2
## Villablanca 2013 1.5544 [0.5300;  4.4697]       1.5        2.8
## Wiebers-R 1998   1.3503 [0.8558;  2.1244]      10.2        8.3
## Wiebers-P 2003   1.0204 [0.6193;  1.6768]      11.3        8.6
## Wilkinson 2018   0.0000 [0.0000; 14.8655]       0.2        0.4
## Zylkowski 2015   2.7273 [0.9318;  7.7131]       0.8        1.8
## 
## Number of studies combined: k = 29
## 
##                      events           95%-CI
## Fixed effect model   0.9869 [0.7914; 1.1985]
## Random effects model 0.8580 [0.5695; 1.1894]
## 
## Quantifying heterogeneity:
##  tau^2 = 0.0004 [0.0000; 0.0017]; tau = 0.0192 [0.0000; 0.0407];
##  I^2 = 38.1% [3.1%; 60.5%]; H = 1.27 [1.02; 1.59]
## 
## Test of heterogeneity:
##      Q d.f. p-value
##  45.26   28  0.0208
## 
## Details on meta-analytical method:
## - Inverse variance method
## - Maximum-likelihood estimator for tau^2
## - Q-profile method for confidence interval of tau^2 and tau
## - Hartung-Knapp adjustment for random effects model
## - Freeman-Tukey double arcsine transformation
## - Wilson Score confidence interval for individual studies
## - Events per 100 observations

While the published literature does have various recommendations on the use of transformations in meta-analysis of proportions, Schwarzer et al. (2019) have reported seriously misleading results when there is a wide range of sample sizes since the Miller back-transformation of the Freeman-Tukey transformation requires a single sample size, which is usually the harmonic mean.

Baseline study characteristics

sizedata10 <- sizedata10 %>%
  mutate(fu_tot = coalesce(fu_mean_tot,fu_med_tot),
         prop_psah = psah_tot / num_tot,
         mid_year = start + (end - start)/2
         )

sizedata10 %>%
  filter(country == "Japan" | country == "United States" | country == "International") %>%
  filter(type == "prospective")
## # A tibble: 7 x 32
##   auth    pub start   end country type  num_tot fem_tot fam_tot multi_tot
##   <chr> <dbl> <dbl> <dbl> <chr>   <chr>   <dbl>   <dbl>   <dbl>     <dbl>
## 1 Bor    2015    NA    NA Intern… pros…     363     280      72       163
## 2 Brod…  2009    NA    NA Intern… pros…     113      75     113        NA
## 3 Loum…  2011  2008  2011 United… pros…     125      92      NA        22
## 4 Mori…  2012  2001  2004 Japan   pros…    2998    1994     342       824
## 5 Mura…  2016  2003  2012 Japan   pros…    1556    1060     205       707
## 6 Sono…  2010  2000  2004 Japan   pros…     374     238      31       124
## 7 Wieb…  2003  1991  1998 Intern… pros…    1692    1261     276       679
## # … with 22 more variables: age_mean <dbl>, age_med <dbl>, num_anr_tot <dbl>,
## #   psah_tot <dbl>, smoke_tot <dbl>, fu_mean_tot <dbl>, fu_med_tot <dbl>,
## #   size <dbl>, num <dbl>, num_anr <dbl>, psah <dbl>, fu <dbl>, rupt <dbl>,
## #   rupt_psah <dbl>, rupt_fatal <dbl>, nos_select <dbl>, nos_compare <dbl>,
## #   nos_outcome <dbl>, nos_total <dbl>, fu_tot <dbl>, prop_psah <dbl>,
## #   mid_year <dbl>
sizedata10 %>%
  filter(pub<2010)
## # A tibble: 9 x 32
##   auth    pub start   end country type  num_tot fem_tot fam_tot multi_tot
##   <chr> <dbl> <dbl> <dbl> <chr>   <chr>   <dbl>   <dbl>   <dbl>     <dbl>
## 1 Brod…  2009    NA    NA Intern… pros…     113      75     113        NA
## 2 Burns  2009  1987  2006 United… retr…     165     120      20        46
## 3 Gibbs  2004  1989  2002 United… retr…      21      14      14         1
## 4 Mats…  2004    NA    NA Japan   retr…     140      94      32        51
## 5 Mizoi  1995  1984  1993 Japan   retr…      49      NA      NA        NA
## 6 Tsuk…  2005  1999  2001 Intern… retr…     181      NA      NA        NA
## 7 Tsut…  2000  1976  1997 Japan   retr…      62      33      NA        14
## 8 Wieb…  1998  1970  1991 Intern… retr…    1449    1052      NA       364
## 9 Wieb…  2003  1991  1998 Intern… pros…    1692    1261     276       679
## # … with 22 more variables: age_mean <dbl>, age_med <dbl>, num_anr_tot <dbl>,
## #   psah_tot <dbl>, smoke_tot <dbl>, fu_mean_tot <dbl>, fu_med_tot <dbl>,
## #   size <dbl>, num <dbl>, num_anr <dbl>, psah <dbl>, fu <dbl>, rupt <dbl>,
## #   rupt_psah <dbl>, rupt_fatal <dbl>, nos_select <dbl>, nos_compare <dbl>,
## #   nos_outcome <dbl>, nos_total <dbl>, fu_tot <dbl>, prop_psah <dbl>,
## #   mid_year <dbl>
sum(sizedata10$num_tot, na.rm = TRUE)
## [1] 11598
fem10 <- sizedata10 %>%
  drop_na(fem_tot)
sum(fem10$num_tot, na.rm = TRUE)
## [1] 10827
sum(fem10$fem_tot, na.rm = TRUE)
## [1] 7622
mean(sizedata10$age_mean, na.rm = TRUE)
## [1] 58.30778
sd(sizedata10$age_mean, na.rm = TRUE)
## [1] 7.065263
mean(sizedata10$fu_tot, na.rm = TRUE)
## [1] 52.37519
min(sizedata10$fu_tot, na.rm = TRUE)
## [1] 16.7
max(sizedata10$fu_tot, na.rm = TRUE)
## [1] 252
mean(sizedata10$prop_psah, na.rm = TRUE)
## [1] 0.1422311
min(sizedata10$prop_psah, na.rm = TRUE)
## [1] 0
max(sizedata10$prop_psah, na.rm = TRUE)
## [1] 0.9225352
sizedata10.juvela <- sizedata10 %>%
  slice(-12)

mean(sizedata10.juvela$prop_psah, na.rm = TRUE)
## [1] 0.1133309
min(sizedata10.juvela$prop_psah, na.rm = TRUE)
## [1] 0
max(sizedata10.juvela$prop_psah, na.rm = TRUE)
## [1] 0.4982747
mean(sizedata10.juvela$fu_tot, na.rm = TRUE)
## [1] 44.69731
min(sizedata10.juvela$fu_tot, na.rm = TRUE)
## [1] 16.7
max(sizedata10.juvela$fu_tot, na.rm = TRUE)
## [1] 99.6
median(sizedata10$mid_year, na.rm = TRUE)
## [1] 2003.25
min(sizedata10$mid_year, na.rm = TRUE)
## [1] 1967
max(sizedata10$mid_year, na.rm = TRUE)
## [1] 2010
median(sizedata10.juvela$mid_year, na.rm = TRUE)
## [1] 2004
sizedata10.juvela %>%
  filter(start<1978)
## # A tibble: 2 x 32
##   auth    pub start   end country type  num_tot fem_tot fam_tot multi_tot
##   <chr> <dbl> <dbl> <dbl> <chr>   <chr>   <dbl>   <dbl>   <dbl>     <dbl>
## 1 Tsut…  2000  1976  1997 Japan   retr…      62      33      NA        14
## 2 Wieb…  1998  1970  1991 Intern… retr…    1449    1052      NA       364
## # … with 22 more variables: age_mean <dbl>, age_med <dbl>, num_anr_tot <dbl>,
## #   psah_tot <dbl>, smoke_tot <dbl>, fu_mean_tot <dbl>, fu_med_tot <dbl>,
## #   size <dbl>, num <dbl>, num_anr <dbl>, psah <dbl>, fu <dbl>, rupt <dbl>,
## #   rupt_psah <dbl>, rupt_fatal <dbl>, nos_select <dbl>, nos_compare <dbl>,
## #   nos_outcome <dbl>, nos_total <dbl>, fu_tot <dbl>, prop_psah <dbl>,
## #   mid_year <dbl>
sum(dat.all$num_tot, na.rm = TRUE)
## [1] 11598
sum(dat.all$num_anr_tot, na.rm = TRUE)
## [1] 15831